Was just looking at Blur warmup logic. I could classify in 2 stages...

StageI
Looks like openShard [DistributedIndexServer] submits the warmup request on
a separate warmupExecutor. This is what is exactly needed for loading
auto-saved block-cache from HDFS...

StageII
But when I prodded a little bit deeper, it got complex. TraceableDirectory,
IndexTracer with thread-local stuff etc… I could not follow the code...

I decided on an impl as follows...

public class BlockCacheWarmup extends BlurIndexWarmup {

@override

 public void warmBlurIndex(final TableDescriptor table, final String shard,
IndexReader reader,

      AtomicBoolean isClosed, ReleaseReader releaseReader, AtomicLong
pauseWarmup) throws IOException {
    for (each segment) {
      for (each file)  {
        //Read cache-meta data from HDFS...
        //Directly open CacheIndexInput and populate block-cache
       }
    }
}

My qstn is…

If I explicitly bypass StageII {TraceableDirectory and friends} and just
populate block-cache alone, will this work-fine. Am I missing something
obvious?

Any help is much appreciated…

--
Ravi

On Fri, Feb 6, 2015 at 2:58 PM, Aaron McCurry <[email protected]> wrote:

> Yes exactly.  That way we could provide a set of blocks to be cache with
> priority, so the most important bits get cached first.
>
> Aaron
>
> On Fri, Feb 6, 2015 at 12:43 AM, Ravikumar Govindarajan <
> [email protected]> wrote:
>
> > That's a great idea...
> >
> > You meant like instead of saving blocks themselves, we can store metadata
> > {block-ids} for each file/shard in HDFS that is written to block-cache...
> >
> > Opening a shard can then use this metadata to re-populate the hot parts
> of
> > the files...
> >
> > We also need to handle evictions & file-deletes...
> >
> > Is this what you are hinting at?
> >
> > --
> > Ravi
> >
> > On Thu, Feb 5, 2015 at 7:03 PM, Aaron McCurry <[email protected]>
> wrote:
> >
> > > On Thu, Feb 5, 2015 at 6:30 AM, Ravikumar Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > I noticed in BigTable impl of Cassandra where they store the
> "Memtable"
> > > > info periodically onto disk to avoid cold start-ups...
> > > >
> > > > Is it possible to do something like that for Blur's block-cache,
> > > preferably
> > > > in HDFS itself so that both cold start-ups and shard take-overs don't
> > > > affect end-user latencies...
> > > >
> > > > In Cassandra's case, the size of Memtable will typically be 2GB-4GB.
> > But
> > > in
> > > > case of Blur, it could even be100 GB. So I don't know if attempting
> > such
> > > > stuff is good idea.
> > > >
> > > > Any help is appreciated much...
> > > >
> > >
> > > Yeah I agree that caches could be very large and storing in HDFS could
> be
> > > counter productive.  Also the block cache represents what is on the
> > single
> > > node and it's not really broken up by shard or table.  So if a node was
> > > restarted without a full cluster restart there's no guarantee that the
> > > shard server will get the same shards back that it was serving before.
> > >
> > > I like the idea though, perhaps we can write out what parts of what
> files
> > > the cache was storing with the lru order.  Then any server that is
> > opening
> > > the shard can know what parts of what files were hot the last time it
> was
> > > open.  Then they could choose to populate the cache upon shard opening.
> > >
> > > Thoughts?
> > >
> > > Aaron
> > >
> > >
> > > >
> > > > --
> > > > Ravi
> > > >
> > >
> >
>

Reply via email to