Aaron, For a poc, please check the sources and comments at blur-290 issue.
-- Ravi On Wednesday, November 13, 2013, Aaron McCurry wrote: > On Mon, Nov 11, 2013 at 1:54 AM, Ravikumar Govindarajan < > [email protected] <javascript:;>> wrote: > > > As you pointed out, there will be some penalty for this cache, especially > > when the number of rowids increases. Interacting with this cache during > > IndexReader open/close is going to have some overhead. > > > > Instead, can we decouple this and make it a "write-through-cache"? > > > > Ex: Map<SegName, Ref-Counted-PrimeDocBitSet> > > > > Codec will publish new data to this cache on flush[new-segment-creation]. > > > > Every access can be ref-counted and during segment removal [merges], > > obsolete entries can be queued and removed from the cache, if ref-count > > drops to zero. > > > > Typically I feel that this cache should be free of IndexReader > open/close, > > but rather live till BlurNRTIndex.close() is called. Then the over-head > is > > really minimal > > > > Not sure I follow you here. Are you talking about the file based bitsets > that used to back the per segment filters? If so then I think they already > live with the shard (BlurNRTIndex.close() as well as the segment). So if > the segment is still living the filters can bee accessed. If the filter is > used, it's pulled into memory. If the filter is written, the block cache > already setup to be a write through cache. > > If I got this all wrong can you describe things again? :-) > > Thanks, > Aaron > > > > > > What do you think? > > > > -- > > Ravi > > > > > > > > > > On Sat, Nov 9, 2013 at 9:52 AM, Aaron McCurry <[email protected]> > wrote: > > > > > On Fri, Nov 8, 2013 at 2:22 AM, Ravikumar Govindarajan < > > > [email protected]> wrote: > > > > > > > Wow, this saving of filters in a custom-codec is super-cool. > > > > > > > > Let me describe the problem I was thinking about. > > > > > > > > Assuming we have the RAMDir and Disk swap approach, I was just > > starting > > > to > > > > deliberate on the Read path. > > > > > > > > PrimeDocCache looks like a challenge for this approach, as the same > row > > > > will now be present across multiple segments. Each segment will have > a > > > > "PrimeDoc" field per-row, but during merge this info gets duplicated > > for > > > > each row. > > > > > > > > I was thinking of recording the "start-doc" of each row to a separate > > > file, > > > > via a custom codec, like you have done for FilterCache. > > > > > > > > During warm-up, it can read the entire file containing "start-docs" > and > > > > populate the PrimeDocCache. > > > > > > > > > > I like the idea, I tend to prototype to figure out how hard and how > > > performant a solution will be. :-) Let's see if we can make it work. > > > > > > Aaron > > > > > > > > > > > > > > -- > > > > Ravi > > > > > > > > > > > > > > > > > > > > On Fri, Nov 8, 2013 at 5:04 AM, Aaron McCurry <[email protected]> > > > wrote: > > > > > > > > > So filter cache is really just a place holder for keeping Lucene > > > Filters > > > > > around between queries. The DefaultFilterCache class does nothing, > > > > however > > > > > I have implemented one that make use of regularly. > > > > > > > > > > > > > > > > > > > > > > > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-core/src/main/java/org/apache/blur/manager/AliasBlurFilterCache.java;h=92491d0ceb3e7ce09902110e3bac5fa485959dab;hb=apache-blur-0.2 > > > > > > > > > > If you write your own and you want to build a logical bitset cache > > for > > > > the > > > > > filter (so it's faster) take a look at the > > > > > "org.apache.blur.filter.FilterCache" > > > > > class. It wraps an existing filter, loads it into the block cache > > and > > > > > writes it disk (via the Directory). The filters live with the > > segment > > > so > > > > > if the segment gets removed so will the on disk "filter" and the > > > > in-memory > > > > > cache of it. > > > > > > > > > > On Thu, Nov 7, 2013 at 8:08 AM, Ravikumar Govindarajan < > > > > > [email protected]> wrote: > > > > > > > > > > > Great. In such a case, it will benefit me for doing a "rowid" > > > > > filter-cache. > > > > > > > > > > > > I saw Blur having a DefaultFilterCache class. Is this the class > > that > > > > need > > > > > > to be customized? Will NRT re-opens [reader-close/open, with > > > > > > applyAllDeletes] take care of auto-invalidating such a cache? > > > > > > > > > > > > > > > > Filtering is a query operation so for each new segment (NRT > re-opens) > > > the > > > > > Lucene Filter API handles creating a new new filter for that > segment. > > > > The > > > >
