Re: Queries with rowId possible?

Ravikumar Govindarajan Thu, 14 Nov 2013 09:11:36 -0800

Aaron,

For a poc, please check the sources and comments at blur-290 issue.


--
Ravi


On Wednesday, November 13, 2013, Aaron McCurry wrote:

> On Mon, Nov 11, 2013 at 1:54 AM, Ravikumar Govindarajan <
> [email protected] <javascript:;>> wrote:
>
> > As you pointed out, there will be some penalty for this cache, especially
> > when the number of rowids increases. Interacting with this cache during
> > IndexReader open/close is going to have some overhead.
> >
> > Instead, can we decouple this and make it a "write-through-cache"?
> >
> > Ex: Map<SegName, Ref-Counted-PrimeDocBitSet>
> >
> > Codec will publish new data to this cache on flush[new-segment-creation].
> >
> > Every access can be ref-counted and during segment removal [merges],
> > obsolete entries can be queued and removed from the cache, if ref-count
> > drops to zero.
> >
> > Typically I feel that this cache should be free of IndexReader
> open/close,
> > but rather live till BlurNRTIndex.close() is called. Then the over-head
> is
> > really minimal
> >
>
> Not sure I follow you here.  Are you talking about the file based bitsets
> that used to back the per segment filters?  If so then I think they already
> live with the shard (BlurNRTIndex.close() as well as the segment).  So if
> the segment is still living the filters can bee accessed.  If the filter is
> used, it's pulled into memory.  If the filter is written, the block cache
> already setup to be a write through cache.
>
> If I got this all wrong can you describe things again?  :-)
>
> Thanks,
> Aaron
>
>
> >
> > What do you think?
> >
> > --
> > Ravi
> >
> >
> >
> >
> > On Sat, Nov 9, 2013 at 9:52 AM, Aaron McCurry <[email protected]>
> wrote:
> >
> > > On Fri, Nov 8, 2013 at 2:22 AM, Ravikumar Govindarajan <
> > > [email protected]> wrote:
> > >
> > > > Wow, this saving of filters in a custom-codec is super-cool.
> > > >
> > > > Let me describe the problem I was thinking about.
> > > >
> > > > Assuming we have the RAMDir and Disk swap approach,  I was just
> > starting
> > > to
> > > > deliberate on the Read path.
> > > >
> > > > PrimeDocCache looks like a challenge for this approach, as the same
> row
> > > > will now be present across multiple segments. Each segment will have
> a
> > > > "PrimeDoc" field per-row, but during merge this info gets duplicated
> > for
> > > > each row.
> > > >
> > > > I was thinking of recording the "start-doc" of each row to a separate
> > > file,
> > > > via a custom codec, like you have done for FilterCache.
> > > >
> > > > During warm-up, it can read the entire file containing "start-docs"
> and
> > > > populate the PrimeDocCache.
> > > >
> > >
> > > I like the idea, I tend to prototype to figure out how hard and how
> > > performant  a solution will be.  :-)  Let's see if we can make it work.
> > >
> > > Aaron
> > >
> > >
> > > >
> > > > --
> > > > Ravi
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Nov 8, 2013 at 5:04 AM, Aaron McCurry <[email protected]>
> > > wrote:
> > > >
> > > > > So filter cache is really just a place holder for keeping Lucene
> > > Filters
> > > > > around between queries.  The DefaultFilterCache class does nothing,
> > > > however
> > > > > I have implemented one that make use of regularly.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=blur-core/src/main/java/org/apache/blur/manager/AliasBlurFilterCache.java;h=92491d0ceb3e7ce09902110e3bac5fa485959dab;hb=apache-blur-0.2
> > > > >
> > > > > If you write your own and you want to build a logical bitset cache
> > for
> > > > the
> > > > > filter (so it's faster) take a look at the
> > > > > "org.apache.blur.filter.FilterCache"
> > > > > class.  It wraps an existing filter, loads it into the block cache
> > and
> > > > > writes it disk (via the Directory).  The filters live with the
> > segment
> > > so
> > > > > if the segment gets removed so will the on disk "filter" and the
> > > > in-memory
> > > > > cache of it.
> > > > >
> > > > > On Thu, Nov 7, 2013 at 8:08 AM, Ravikumar Govindarajan <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > Great. In such a case, it will benefit me for doing a "rowid"
> > > > > filter-cache.
> > > > > >
> > > > > > I saw Blur having a DefaultFilterCache class. Is this the class
> > that
> > > > need
> > > > > > to be customized? Will NRT re-opens [reader-close/open, with
> > > > > > applyAllDeletes] take care of auto-invalidating such a cache?
> > > > > >
> > > > >
> > > > > Filtering is a query operation so for each new segment (NRT
> re-opens)
> > > the
> > > > > Lucene Filter API handles creating a new new filter for that
> segment.
> > > >  The
> > > >

Re: Queries with rowId possible?

Reply via email to