Re: CacheIndexInput cacheSize

Ravikumar Govindarajan Thu, 20 Oct 2016 22:42:01 -0700

Our application makes use of 'write-thru-block-cache' only. During
search/merge-reads, we have modified block-cache code to only probe the
block-cache and avoid inserting to it.


In such a usage scenario, I was thinking about introducing a
'readBufferSize'  (default=1KB) in CacheIndexInput. From block-cache or
underlying file we read only 'readBufferSize' data & adjust counters
accordingly when it's a short-circuit read...

You think it could be made workable?

Another idea could be to bypass the cache directory during merges and read
> directly from the hdfsdirectory.  Then perhaps you could take advantage of
> the SC reads without having to deal with the cache directly.


This is what we are currently evaluating & it looks to be a safe bet

--
Ravi

On Fri, Oct 21, 2016 at 3:26 AM, Aaron McCurry <[email protected]> wrote:

> I my experience I too have used block cache sizes in the 64KB range for the
> same reasons you listed.  The biggest of which was because we were running
> upwards of 100GB caches and 1K block cache sizes are not really possible at
> that size.  The biggest probably with the compaction is with the .tim file,
> the rest of the files are mostly sequential reads, but because that file is
> a tree it tends to jump all over the place during compaction.  I would
> recommend if you want to speed up compaction (merges) to allow the tim
> files to be put into block cache during the merge (e.i. turn quiet reads
> off for those files).  This of course could flow your cache with data that
> you are about to remove, so if you have the cache space it's the easiest
> solution.
>
> Another idea could be to bypass the cache directory during merges and read
> directly from the hdfsdirectory.  Then perhaps you could take advantage of
> the SC reads without having to deal with the cache directly.
>
> Aaron
>
> On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > We have set a fairly large cacheSize of 64KB in block-cache for avoiding
> > too many keys, gc pressure etc...
> >
> > But CacheIndexInput tries to read 64KB of data during a cache-miss &
> fills
> > up the CacheValue. When doing short-circuit-reads, this could turn out to
> > be excessive no? For a comparison, lucene uses only 1KB buffers for the
> > same..
> >
> > Do you think this will likely affect performance of searches albeit in a
> > minor way?
> >
> > --
> > Ravi
> >
>

Re: CacheIndexInput cacheSize

Reply via email to