[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12431696 ] 
            
paul.elschot commented on LUCENE-584:
-------------------------------------

Yonik, as to you questions:

> It looks like no Filters currently return a matcher, so the current patch 
> just lays the groundwork, right?

Right. Only the previous Filter-20060628.patch contains some commented FIXME 
code to actually introduce a BitsMatcher in each case where a BitSet is used.

>When some filters do start to return a matcher, it looks like support for the 
>1.4 BooleanScorer needs
> to be removed, or a check done in IndexSearcher.search() to disable skipping 
> on the scorer if it's in use.

Iirc the patch still supports the 1.4 BooleanScorer when a BitSet is returned 
by Filter. I'd have to have a look at the patched IndexSearcher to be sure 
though.
A BitSet is randomly addressable, so it can work to filter the 1.4 
BooleanScorer which can score documents out of order.  This case can be 
deprecated completely by also deprecating the possibility to use the 1.4 
boolean scorer, but that is not in the patch. The patch only deprecates the 
Filter.bits() method.


> I wonder what the performance impact is... for a dense search with a dense 
> bitset
> filter, it looks like quite a bit of overhead is added (two calls in order to 
> get the next 
> doc, use of nextSetBit() instead of get(), checking "exhausted" each time and 
> checking for -1 to set exhausted). I suppose one can always drop back to using
> a HitCollector for special cases though.

BitsMatcher could also work without the "exhausted" flag, but then an infinite 
loop
might occur when trying to continue after the first time next() or skipTo() 
returned false.
Continuing after false was returned in these cases is a bug, however an 
infinite loop
can be difficult to debug. I'd rather be on the safe side of that with the 
exhausted flag and wait for an actual profile to show the performance problem.

Regards,
Paul Elschot


> Decouple Filter from BitSet
> ---------------------------
>
>                 Key: LUCENE-584
>                 URL: http://issues.apache.org/jira/browse/LUCENE-584
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.0.1
>            Reporter: Peter Schäfer
>            Priority: Minor
>         Attachments: BitsMatcher.java, Filter-20060628.patch, 
> HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
> MatchCollector.java, Matcher.java, Matcher20060830.patch, 
> Matcher20060830b.patch, Scorer-20060628.patch, Searchable-20060628.patch, 
> Searcher-20060628.patch, SortedVIntList.java, TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable 
> {
>   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet 
> {
>   public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract 
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's 
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
> memory. It would be desirable to have an alternative BitSet implementation 
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation 
> could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to