Oh I am sorry…

I find that Blur Warmup code has been completely removed in repository now.
Are there reasons for doing the same?

I thought auto-save/loading of block-cache can benefit from this nicely

--
Ravi

On Tue, Apr 21, 2015 at 6:06 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> Was just looking at Blur warmup logic. I could classify in 2 stages...
>
> StageI
> Looks like openShard [DistributedIndexServer] submits the warmup request
> on a separate warmupExecutor. This is what is exactly needed for loading
> auto-saved block-cache from HDFS...
>
> StageII
> But when I prodded a little bit deeper, it got complex.
> TraceableDirectory, IndexTracer with thread-local stuff etc… I could not
> follow the code...
>
> I decided on an impl as follows...
>
> public class BlockCacheWarmup extends BlurIndexWarmup {
>
> @override
>
>  public void warmBlurIndex(final TableDescriptor table, final String
> shard, IndexReader reader,
>
>       AtomicBoolean isClosed, ReleaseReader releaseReader, AtomicLong
> pauseWarmup) throws IOException {
>     for (each segment) {
>       for (each file)  {
>         //Read cache-meta data from HDFS...
>         //Directly open CacheIndexInput and populate block-cache
>        }
>     }
> }
>
> My qstn is…
>
> If I explicitly bypass StageII {TraceableDirectory and friends} and just
> populate block-cache alone, will this work-fine. Am I missing something
> obvious?
>
> Any help is much appreciated…
>
> --
> Ravi
>
> On Fri, Feb 6, 2015 at 2:58 PM, Aaron McCurry <[email protected]> wrote:
>
>> Yes exactly.  That way we could provide a set of blocks to be cache with
>> priority, so the most important bits get cached first.
>>
>> Aaron
>>
>> On Fri, Feb 6, 2015 at 12:43 AM, Ravikumar Govindarajan <
>> [email protected]> wrote:
>>
>> > That's a great idea...
>> >
>> > You meant like instead of saving blocks themselves, we can store
>> metadata
>> > {block-ids} for each file/shard in HDFS that is written to
>> block-cache...
>> >
>> > Opening a shard can then use this metadata to re-populate the hot parts
>> of
>> > the files...
>> >
>> > We also need to handle evictions & file-deletes...
>> >
>> > Is this what you are hinting at?
>> >
>> > --
>> > Ravi
>> >
>> > On Thu, Feb 5, 2015 at 7:03 PM, Aaron McCurry <[email protected]>
>> wrote:
>> >
>> > > On Thu, Feb 5, 2015 at 6:30 AM, Ravikumar Govindarajan <
>> > > [email protected]> wrote:
>> > >
>> > > > I noticed in BigTable impl of Cassandra where they store the
>> "Memtable"
>> > > > info periodically onto disk to avoid cold start-ups...
>> > > >
>> > > > Is it possible to do something like that for Blur's block-cache,
>> > > preferably
>> > > > in HDFS itself so that both cold start-ups and shard take-overs
>> don't
>> > > > affect end-user latencies...
>> > > >
>> > > > In Cassandra's case, the size of Memtable will typically be 2GB-4GB.
>> > But
>> > > in
>> > > > case of Blur, it could even be100 GB. So I don't know if attempting
>> > such
>> > > > stuff is good idea.
>> > > >
>> > > > Any help is appreciated much...
>> > > >
>> > >
>> > > Yeah I agree that caches could be very large and storing in HDFS
>> could be
>> > > counter productive.  Also the block cache represents what is on the
>> > single
>> > > node and it's not really broken up by shard or table.  So if a node
>> was
>> > > restarted without a full cluster restart there's no guarantee that the
>> > > shard server will get the same shards back that it was serving before.
>> > >
>> > > I like the idea though, perhaps we can write out what parts of what
>> files
>> > > the cache was storing with the lru order.  Then any server that is
>> > opening
>> > > the shard can know what parts of what files were hot the last time it
>> was
>> > > open.  Then they could choose to populate the cache upon shard
>> opening.
>> > >
>> > > Thoughts?
>> > >
>> > > Aaron
>> > >
>> > >
>> > > >
>> > > > --
>> > > > Ravi
>> > > >
>> > >
>> >
>>
>
>

Reply via email to