I see this <https://gerrit.cloudera.org/#/c/691/> change has reset
--max_cached_file_handles to 0 effectively disabling hdfs file handle
caching.  Any idea why?

I don't think it consumes too much memory (~20MB for 10k cached handles).
The reason I'm asking this is, without caching, we'd have to create a new
handle for every scan range and hence a new RPC every time.

// The number of cached file handles defines how much memory can be used
per backend for
// caching frequently used file handles. Currently, we assume that
approximately 2kB data
// are associated with a single file handle. 10k file handles will thus
reserve ~20MB
// data. The actual amount of memory that is associated with a file handle
can be larger
// or smaller, depending on the replication factor for this file or the
path name.
DEFINE_uint64(max_cached_file_handles, 0, "Maximum number of HDFS file
handles "
"that will be cached. Disabled if set to 0.");

Reply via email to