We don't have the tests for it. I think what we should have is test that
does stuff to the files from outside of Impala (and therefore invalidates
the file handles), and make sure nothing "bad" happens (like crashes).  And
maybe some invalidate metadata / refreshes (since IIRC we use the mtime at
those points as part of the handle key).

Also, I don't remember what the eviction policy is. We should probably
verify that memory usage is bounded (both on Impala and hdfs side).

On Thu, Dec 8, 2016 at 11:14 AM, Bharath Vissapragada <[email protected]
> wrote:

> I see this <https://gerrit.cloudera.org/#/c/691/> change has reset
> --max_cached_file_handles to 0 effectively disabling hdfs file handle
> caching.  Any idea why?
>
> I don't think it consumes too much memory (~20MB for 10k cached handles).
> The reason I'm asking this is, without caching, we'd have to create a new
> handle for every scan range and hence a new RPC every time.
>
> // The number of cached file handles defines how much memory can be used
> per backend for
> // caching frequently used file handles. Currently, we assume that
> approximately 2kB data
> // are associated with a single file handle. 10k file handles will thus
> reserve ~20MB
> // data. The actual amount of memory that is associated with a file handle
> can be larger
> // or smaller, depending on the replication factor for this file or the
> path name.
> DEFINE_uint64(max_cached_file_handles, 0, "Maximum number of HDFS file
> handles "
> "that will be cached. Disabled if set to 0.");
>

Reply via email to