I see this <https://gerrit.cloudera.org/#/c/691/> change has reset --max_cached_file_handles to 0 effectively disabling hdfs file handle caching. Any idea why?
I don't think it consumes too much memory (~20MB for 10k cached handles). The reason I'm asking this is, without caching, we'd have to create a new handle for every scan range and hence a new RPC every time. // The number of cached file handles defines how much memory can be used per backend for // caching frequently used file handles. Currently, we assume that approximately 2kB data // are associated with a single file handle. 10k file handles will thus reserve ~20MB // data. The actual amount of memory that is associated with a file handle can be larger // or smaller, depending on the replication factor for this file or the path name. DEFINE_uint64(max_cached_file_handles, 0, "Maximum number of HDFS file handles " "that will be cached. Disabled if set to 0.");
