I think, once we can efficiently apply cheap random-access docIDSets
the way deleted docs are applied (ie, distribute down to all
SegmentTermDocs) then it'd be useful for this filter manager to also
pre-fold deletes in, such that SegmentTermDocs would only have a
single random-access docIDSet to check.

Mike

On Wed, Jun 3, 2009 at 4:03 AM, Shai Erera<ser...@gmail.com> wrote:
> Thanks Paul !
>
> I'll work such a utility (which takes a Filter and reads it into an
> OpenBitSet, SortedVIntList) and then post back in case you'll be interested
> in adopting it, and change CWF to use it, or something else.
>
> Shai
>
> On Tue, Jun 2, 2009 at 9:35 PM, Paul Elschot <paul.elsc...@xs4all.nl> wrote:
>>
>> On Tuesday 02 June 2009 16:39:06 Shai Erera wrote:
>> > Hi
>> >
>> > I read CWF today and initially I thought this is going to cache a Filter
>> > in-memory for me, so that I can more efficiently use it for subsequent
>> > searches. But I learned that all it does is cache the DocIdSet returned
>> > by
>> > the wrapped Filter.
>> >
>> > This is good in and on itself, but I wonder if we shouldn't go the extra
>> > mile and wrap stuff in memory for Filters which don't operate from
>> > memory.
>>
>> It was good until QueryWrapperFilter returned a Scorer instead of a disi
>> based on an (Open)BitSet.
>>
>> > For example - I have a Filter which reads information from a Payload as
>> > it's
>> > iterated on, so it doesn't keep anything in memory (it's per-user
>> > information, so I haven't decided yet if I can afford caching it
>> > in-memory
>> > and whether it will be beneficial). Caching that sort of Filter by CWF
>> > will
>> > obviously not improve anything.
>> >
>> > I'm not sure what to do here:
>> > 1. Just reflect that in the javadoc (it is very confusing saying "Wraps
>> > another filter's result and caches it", which is not true)
>> > 2. Introduce a class which takes a Filter and loads it into memory (I
>> > think
>> > I read an issue/discussion about this), to an OpenBitSet for example
>> > (but we
>> > need to know the number of results in advance, or grow the array as we
>> > go
>> > along).
>> > 3. Don't use CWF, write a "load-a-Filter-into-in-memory-Filter" utility,
>> > and
>> > cache the Filters w/ the user as Key.
>>
>> For that, one could subclass CWF and override the docIdSetToCache method
>> to return an OpenBitSetDISI constructed from the given disi.
>>
>> > I will probably need to do the second part of (3) anyway, so I'm asking
>> > whether such a utility is useful to exist in Lucene, and perhaps there's
>> > already one (I thought I read somewhere about the ability to execute a
>> > Query
>> > and get back a Filter, or use the results as a Filter)?
>>
>> That is what QueryWrapperFilter does.
>>
>> > I looked at
>> > QueryWrapperFilter, but it doesn't seem to give me what I need, since
>> > its
>> > getDocIdSet method returns an iterator which is the Scorer of the Query
>> > that
>> > it wraps.
>>
>> The Scorer seems to be what you need, but there are cheaper disis, see
>> below.
>>
>> >
>> > Anyway, I think the documentation of CWF should be fixed and made
>> > clearer.
>> >
>> > Any thoughts?
>>
>> The basic problem is that disis from DocIdSets come in two variations:
>> expensive
>> ones e.g. based on a query, and cheap ones based e.g. on an OpenBitSet or
>> on
>> a SortedVIntList.
>> One would normally want to cache a DocIdSet that provides a cheap disi.
>>
>> For the javadocs of the current CWF it could be sufficient to mention more
>> prominently that the default CWF caches the given DocIdSet, basically
>> assuming that it's disi is cheap.
>>
>> But it might be a good idea to change the default implementation to check
>> whether the given DocIdSet is an OpenBitSet, and use that to be cached in
>> that case, and otherwise provide an OpenBitSetDISI.
>>
>> Regards,
>> Paul Elschot
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to