Thanks Paul ! I'll work such a utility (which takes a Filter and reads it into an OpenBitSet, SortedVIntList) and then post back in case you'll be interested in adopting it, and change CWF to use it, or something else.
Shai On Tue, Jun 2, 2009 at 9:35 PM, Paul Elschot <paul.elsc...@xs4all.nl> wrote: > On Tuesday 02 June 2009 16:39:06 Shai Erera wrote: > > Hi > > > > I read CWF today and initially I thought this is going to cache a Filter > > in-memory for me, so that I can more efficiently use it for subsequent > > searches. But I learned that all it does is cache the DocIdSet returned > by > > the wrapped Filter. > > > > This is good in and on itself, but I wonder if we shouldn't go the extra > > mile and wrap stuff in memory for Filters which don't operate from > memory. > > > It was good until QueryWrapperFilter returned a Scorer instead of a disi > based on an (Open)BitSet. > > > > For example - I have a Filter which reads information from a Payload as > it's > > iterated on, so it doesn't keep anything in memory (it's per-user > > information, so I haven't decided yet if I can afford caching it > in-memory > > and whether it will be beneficial). Caching that sort of Filter by CWF > will > > obviously not improve anything. > > > > I'm not sure what to do here: > > 1. Just reflect that in the javadoc (it is very confusing saying "Wraps > > another filter's result and caches it", which is not true) > > 2. Introduce a class which takes a Filter and loads it into memory (I > think > > I read an issue/discussion about this), to an OpenBitSet for example (but > we > > need to know the number of results in advance, or grow the array as we go > > along). > > 3. Don't use CWF, write a "load-a-Filter-into-in-memory-Filter" utility, > and > > cache the Filters w/ the user as Key. > > > For that, one could subclass CWF and override the docIdSetToCache method > to return an OpenBitSetDISI constructed from the given disi. > > > > I will probably need to do the second part of (3) anyway, so I'm asking > > whether such a utility is useful to exist in Lucene, and perhaps there's > > already one (I thought I read somewhere about the ability to execute a > Query > > and get back a Filter, or use the results as a Filter)? > > > That is what QueryWrapperFilter does. > > > > I looked at > > QueryWrapperFilter, but it doesn't seem to give me what I need, since its > > getDocIdSet method returns an iterator which is the Scorer of the Query > that > > it wraps. > > > The Scorer seems to be what you need, but there are cheaper disis, see > below. > > > > > > Anyway, I think the documentation of CWF should be fixed and made > clearer. > > > > Any thoughts? > > > The basic problem is that disis from DocIdSets come in two variations: > expensive > ones e.g. based on a query, and cheap ones based e.g. on an OpenBitSet or > on > a SortedVIntList. > One would normally want to cache a DocIdSet that provides a cheap disi. > > > For the javadocs of the current CWF it could be sufficient to mention more > prominently that the default CWF caches the given DocIdSet, basically > assuming that it's disi is cheap. > > > But it might be a good idea to change the default implementation to check > whether the given DocIdSet is an OpenBitSet, and use that to be cached in > that case, and otherwise provide an OpenBitSetDISI. > > > Regards, > Paul Elschot > > >