Re: CacheIndexInput cacheSize

Ravikumar Govindarajan Fri, 02 Dec 2016 04:50:35 -0800

One thing I was wondering is, does block-cache acquire locks of any kind
during reads?


I don't use the 'read-then-cache' construct at all, so was just thinking if
it is fine to eliminate locks (if any) on the read path


On Mon, Oct 24, 2016 at 7:07 PM, Aaron McCurry <[email protected]> wrote:

> On Fri, Oct 21, 2016 at 1:41 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > Our application makes use of 'write-thru-block-cache' only. During
> > search/merge-reads, we have modified block-cache code to only probe the
> > block-cache and avoid inserting to it.
> >
> > In such a usage scenario, I was thinking about introducing a
> > 'readBufferSize'  (default=1KB) in CacheIndexInput. From block-cache or
> > underlying file we read only 'readBufferSize' data & adjust counters
> > accordingly when it's a short-circuit read...
> >
> > You think it could be made workable?
> >
>
> Yeah it should be.
>
>
> >
> > Another idea could be to bypass the cache directory during merges and
> read
> > > directly from the hdfsdirectory.  Then perhaps you could take advantage
> > of
> > > the SC reads without having to deal with the cache directly.
> >
> >
> > This is what we are currently evaluating & it looks to be a safe bet
> >
>
> Ok, let me know if you have any questions.
>
>
> >
> > --
> > Ravi
> >
> > On Fri, Oct 21, 2016 at 3:26 AM, Aaron McCurry <[email protected]>
> wrote:
> >
> > > I my experience I too have used block cache sizes in the 64KB range for
> > the
> > > same reasons you listed.  The biggest of which was because we were
> > running
> > > upwards of 100GB caches and 1K block cache sizes are not really
> possible
> > at
> > > that size.  The biggest probably with the compaction is with the .tim
> > file,
> > > the rest of the files are mostly sequential reads, but because that
> file
> > is
> > > a tree it tends to jump all over the place during compaction.  I would
> > > recommend if you want to speed up compaction (merges) to allow the tim
> > > files to be put into block cache during the merge (e.i. turn quiet
> reads
> > > off for those files).  This of course could flow your cache with data
> > that
> > > you are about to remove, so if you have the cache space it's the
> easiest
> > > solution.
> > >
> > > Another idea could be to bypass the cache directory during merges and
> > read
> > > directly from the hdfsdirectory.  Then perhaps you could take advantage
> > of
> > > the SC reads without having to deal with the cache directly.
> > >
> > > Aaron
> > >
> > > On Thu, Oct 20, 2016 at 3:53 AM, Ravikumar Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > We have set a fairly large cacheSize of 64KB in block-cache for
> > avoiding
> > > > too many keys, gc pressure etc...
> > > >
> > > > But CacheIndexInput tries to read 64KB of data during a cache-miss &
> > > fills
> > > > up the CacheValue. When doing short-circuit-reads, this could turn
> out
> > to
> > > > be excessive no? For a comparison, lucene uses only 1KB buffers for
> the
> > > > same..
> > > >
> > > > Do you think this will likely affect performance of searches albeit
> in
> > a
> > > > minor way?
> > > >
> > > > --
> > > > Ravi
> > > >
> > >
> >
>

Re: CacheIndexInput cacheSize

Reply via email to