Re: Why is FS handle caching disabled by default?

Bharath Vissapragada Thu, 08 Dec 2016 12:36:46 -0800

Thanks Dan.

On Thu, Dec 8, 2016 at 11:57 AM, Daniel Hecht <[email protected]> wrote:


> We don't have the tests for it. I think what we should have is test that
> does stuff to the files from outside of Impala (and therefore invalidates
> the file handles), and make sure nothing "bad" happens (like crashes).  And
> maybe some invalidate metadata / refreshes (since IIRC we use the mtime at
> those points as part of the handle key)


> Also, I don't remember what the eviction policy is. We should probably
> verify that memory usage is bounded (both on Impala and hdfs side).
>

Per my understanding, we only evict after an mtime mismatch. Else we'd
cache it forever.


>
> On Thu, Dec 8, 2016 at 11:14 AM, Bharath Vissapragada <
> [email protected]
> > wrote:
>
> > I see this <https://gerrit.cloudera.org/#/c/691/> change has reset
> > --max_cached_file_handles to 0 effectively disabling hdfs file handle
> > caching.  Any idea why?
> >
> > I don't think it consumes too much memory (~20MB for 10k cached handles).
> > The reason I'm asking this is, without caching, we'd have to create a new
> > handle for every scan range and hence a new RPC every time.
> >
> > // The number of cached file handles defines how much memory can be used
> > per backend for
> > // caching frequently used file handles. Currently, we assume that
> > approximately 2kB data
> > // are associated with a single file handle. 10k file handles will thus
> > reserve ~20MB
> > // data. The actual amount of memory that is associated with a file
> handle
> > can be larger
> > // or smaller, depending on the replication factor for this file or the
> > path name.
> > DEFINE_uint64(max_cached_file_handles, 0, "Maximum number of HDFS file
> > handles "
> > "that will be cached. Disabled if set to 0.");
> >
>

Re: Why is FS handle caching disabled by default?

Reply via email to