I think, once we can efficiently apply cheap random-access docIDSets the way deleted docs are applied (ie, distribute down to all SegmentTermDocs) then it'd be useful for this filter manager to also pre-fold deletes in, such that SegmentTermDocs would only have a single random-access docIDSet to check.
Mike On Wed, Jun 3, 2009 at 4:03 AM, Shai Erera<ser...@gmail.com> wrote: > Thanks Paul ! > > I'll work such a utility (which takes a Filter and reads it into an > OpenBitSet, SortedVIntList) and then post back in case you'll be interested > in adopting it, and change CWF to use it, or something else. > > Shai > > On Tue, Jun 2, 2009 at 9:35 PM, Paul Elschot <paul.elsc...@xs4all.nl> wrote: >> >> On Tuesday 02 June 2009 16:39:06 Shai Erera wrote: >> > Hi >> > >> > I read CWF today and initially I thought this is going to cache a Filter >> > in-memory for me, so that I can more efficiently use it for subsequent >> > searches. But I learned that all it does is cache the DocIdSet returned >> > by >> > the wrapped Filter. >> > >> > This is good in and on itself, but I wonder if we shouldn't go the extra >> > mile and wrap stuff in memory for Filters which don't operate from >> > memory. >> >> It was good until QueryWrapperFilter returned a Scorer instead of a disi >> based on an (Open)BitSet. >> >> > For example - I have a Filter which reads information from a Payload as >> > it's >> > iterated on, so it doesn't keep anything in memory (it's per-user >> > information, so I haven't decided yet if I can afford caching it >> > in-memory >> > and whether it will be beneficial). Caching that sort of Filter by CWF >> > will >> > obviously not improve anything. >> > >> > I'm not sure what to do here: >> > 1. Just reflect that in the javadoc (it is very confusing saying "Wraps >> > another filter's result and caches it", which is not true) >> > 2. Introduce a class which takes a Filter and loads it into memory (I >> > think >> > I read an issue/discussion about this), to an OpenBitSet for example >> > (but we >> > need to know the number of results in advance, or grow the array as we >> > go >> > along). >> > 3. Don't use CWF, write a "load-a-Filter-into-in-memory-Filter" utility, >> > and >> > cache the Filters w/ the user as Key. >> >> For that, one could subclass CWF and override the docIdSetToCache method >> to return an OpenBitSetDISI constructed from the given disi. >> >> > I will probably need to do the second part of (3) anyway, so I'm asking >> > whether such a utility is useful to exist in Lucene, and perhaps there's >> > already one (I thought I read somewhere about the ability to execute a >> > Query >> > and get back a Filter, or use the results as a Filter)? >> >> That is what QueryWrapperFilter does. >> >> > I looked at >> > QueryWrapperFilter, but it doesn't seem to give me what I need, since >> > its >> > getDocIdSet method returns an iterator which is the Scorer of the Query >> > that >> > it wraps. >> >> The Scorer seems to be what you need, but there are cheaper disis, see >> below. >> >> > >> > Anyway, I think the documentation of CWF should be fixed and made >> > clearer. >> > >> > Any thoughts? >> >> The basic problem is that disis from DocIdSets come in two variations: >> expensive >> ones e.g. based on a query, and cheap ones based e.g. on an OpenBitSet or >> on >> a SortedVIntList. >> One would normally want to cache a DocIdSet that provides a cheap disi. >> >> For the javadocs of the current CWF it could be sufficient to mention more >> prominently that the default CWF caches the given DocIdSet, basically >> assuming that it's disi is cheap. >> >> But it might be a good idea to change the default implementation to check >> whether the given DocIdSet is an OpenBitSet, and use that to be cached in >> that case, and otherwise provide an OpenBitSetDISI. >> >> Regards, >> Paul Elschot >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org