Re: Auto-save of block-cache....

Ravikumar Govindarajan Fri, 24 Apr 2015 02:45:13 -0700

>
> Ravi,
>
> Sorry not getting back to you sooner.


No problems at all.

There were many times when I
> used it when it would completely fill the configured block cache.  This
> would in turn cause things that were actually being used to be pushed out
> of the BC and performance would suffer


Yes quite true. Some sort of read storm in the network could cause
problems. I saw a ThrottledIndexInput in old-code. Guess it was meant for
this...

Now the auto save and load warmup feature would be a completely different
> piece of that I think would be of great benefit for restarts / failures.


Yup. This must be a good feature for single machine failure/restarts..

But I just realized that the worst-case for this auto-save/load will be
cluster-wide restarts.. You think ThrottledIndexInput can serve us well
here also or we need some higher-order construct?

--
Ravi

On Thu, Apr 23, 2015 at 5:20 PM, Aaron McCurry <[email protected]> wrote:

> Ravi,
>
> Sorry not getting back to you sooner.
>
> I agree that auto saving and loading of the blocks that are actually used
> would be a benefit.  However the previous warmup process in practice pulled
> in too much data that was never accessed.  There were many times when I
> used it when it would completely fill the configured block cache.  This
> would in turn cause things that were actually being used to be pushed out
> of the BC and performance would suffer.  So we ran Blur for some time with
> the warmup process disabled and for the most part the entire system worked
> better.
>
> Now the auto save and load warmup feature would be a completely different
> piece of that I think would be of great benefit for restarts / failures.
>
> Aaron
>
> On Thu, Apr 23, 2015 at 7:29 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > Oh I am sorry…
> >
> > I find that Blur Warmup code has been completely removed in repository
> now.
> > Are there reasons for doing the same?
> >
> > I thought auto-save/loading of block-cache can benefit from this nicely
> >
> > --
> > Ravi
> >
> > On Tue, Apr 21, 2015 at 6:06 PM, Ravikumar Govindarajan <
> > [email protected]> wrote:
> >
> > > Was just looking at Blur warmup logic. I could classify in 2 stages...
> > >
> > > StageI
> > > Looks like openShard [DistributedIndexServer] submits the warmup
> request
> > > on a separate warmupExecutor. This is what is exactly needed for
> loading
> > > auto-saved block-cache from HDFS...
> > >
> > > StageII
> > > But when I prodded a little bit deeper, it got complex.
> > > TraceableDirectory, IndexTracer with thread-local stuff etc… I could
> not
> > > follow the code...
> > >
> > > I decided on an impl as follows...
> > >
> > > public class BlockCacheWarmup extends BlurIndexWarmup {
> > >
> > > @override
> > >
> > >  public void warmBlurIndex(final TableDescriptor table, final String
> > > shard, IndexReader reader,
> > >
> > >       AtomicBoolean isClosed, ReleaseReader releaseReader, AtomicLong
> > > pauseWarmup) throws IOException {
> > >     for (each segment) {
> > >       for (each file)  {
> > >         //Read cache-meta data from HDFS...
> > >         //Directly open CacheIndexInput and populate block-cache
> > >        }
> > >     }
> > > }
> > >
> > > My qstn is…
> > >
> > > If I explicitly bypass StageII {TraceableDirectory and friends} and
> just
> > > populate block-cache alone, will this work-fine. Am I missing something
> > > obvious?
> > >
> > > Any help is much appreciated…
> > >
> > > --
> > > Ravi
> > >
> > > On Fri, Feb 6, 2015 at 2:58 PM, Aaron McCurry <[email protected]>
> > wrote:
> > >
> > >> Yes exactly.  That way we could provide a set of blocks to be cache
> with
> > >> priority, so the most important bits get cached first.
> > >>
> > >> Aaron
> > >>
> > >> On Fri, Feb 6, 2015 at 12:43 AM, Ravikumar Govindarajan <
> > >> [email protected]> wrote:
> > >>
> > >> > That's a great idea...
> > >> >
> > >> > You meant like instead of saving blocks themselves, we can store
> > >> metadata
> > >> > {block-ids} for each file/shard in HDFS that is written to
> > >> block-cache...
> > >> >
> > >> > Opening a shard can then use this metadata to re-populate the hot
> > parts
> > >> of
> > >> > the files...
> > >> >
> > >> > We also need to handle evictions & file-deletes...
> > >> >
> > >> > Is this what you are hinting at?
> > >> >
> > >> > --
> > >> > Ravi
> > >> >
> > >> > On Thu, Feb 5, 2015 at 7:03 PM, Aaron McCurry <[email protected]>
> > >> wrote:
> > >> >
> > >> > > On Thu, Feb 5, 2015 at 6:30 AM, Ravikumar Govindarajan <
> > >> > > [email protected]> wrote:
> > >> > >
> > >> > > > I noticed in BigTable impl of Cassandra where they store the
> > >> "Memtable"
> > >> > > > info periodically onto disk to avoid cold start-ups...
> > >> > > >
> > >> > > > Is it possible to do something like that for Blur's block-cache,
> > >> > > preferably
> > >> > > > in HDFS itself so that both cold start-ups and shard take-overs
> > >> don't
> > >> > > > affect end-user latencies...
> > >> > > >
> > >> > > > In Cassandra's case, the size of Memtable will typically be
> > 2GB-4GB.
> > >> > But
> > >> > > in
> > >> > > > case of Blur, it could even be100 GB. So I don't know if
> > attempting
> > >> > such
> > >> > > > stuff is good idea.
> > >> > > >
> > >> > > > Any help is appreciated much...
> > >> > > >
> > >> > >
> > >> > > Yeah I agree that caches could be very large and storing in HDFS
> > >> could be
> > >> > > counter productive.  Also the block cache represents what is on
> the
> > >> > single
> > >> > > node and it's not really broken up by shard or table.  So if a
> node
> > >> was
> > >> > > restarted without a full cluster restart there's no guarantee that
> > the
> > >> > > shard server will get the same shards back that it was serving
> > before.
> > >> > >
> > >> > > I like the idea though, perhaps we can write out what parts of
> what
> > >> files
> > >> > > the cache was storing with the lru order.  Then any server that is
> > >> > opening
> > >> > > the shard can know what parts of what files were hot the last time
> > it
> > >> was
> > >> > > open.  Then they could choose to populate the cache upon shard
> > >> opening.
> > >> > >
> > >> > > Thoughts?
> > >> > >
> > >> > > Aaron
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > --
> > >> > > > Ravi
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Auto-save of block-cache....

Reply via email to