[ 
https://issues.apache.org/jira/browse/LUCENE-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5441:
---------------------------------
    Attachment: LUCENE-5441.patch

I think this is an important change so I tried to iterate on Shai and Uwe's 
work. A few things changed since the last patch with OpenBitSet removed and 
FixedBitSet's new thiner brother SparseFixedBitSet. Here is a summary of what 
this new patch changes:
 - FixedBitSet renamed to IntBitSet
 - SparseFixedBitSet renamed to SparseIntBitSet
 - IntBitSet and SparseIntBitSet do not extend DocIdSet anymore, you need to 
wrap the bit set with (Sparse)IntBitDocIdSet
 - IntBitDocIdSet and SparseBitDocIdSet require the {{cost}} to be provided 
explicitely, so that you can use the actual set cardinality if you already know 
it, or use cardinality() if it makes sense. This should help make better 
decisions when there are bitsets involved.

The major difference compared to the previous patch is that the 
or/and/andNot/xor methods are still on FixedBitSet. The reasoning here is that 
by having the IntBitDocIdSet not exposing mutators, it makes sense to cache the 
cost. I think these mutator methods would make more sense on something like 
oal.util.DocIdSetBuilder.

> Decouple DocIdSet from OpenBitSet and FixedBitSet
> -------------------------------------------------
>
>                 Key: LUCENE-5441
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5441
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/other
>    Affects Versions: 4.6.1
>            Reporter: Uwe Schindler
>             Fix For: Trunk
>
>         Attachments: LUCENE-5441.patch, LUCENE-5441.patch, LUCENE-5441.patch, 
> LUCENE-5441.patch
>
>
> Back from the times of Lucene 2.4 when DocIdSet was introduced, we somehow 
> kept the stupid "filters can return a BitSet directly" in the code. So lots 
> of Filters return just FixedBitSet, because this is the superclass (ideally 
> interface) of FixedBitSet.
> We should decouple that and *not* implement that abstract interface directly 
> by FixedBitSet. This leads to bugs e.g. in BlockJoin, because it used Filters 
> in a wrong way, just because it was always returning Bitsets. But some 
> filters actually don't do this.
> I propose to let FixedBitSet (only in trunk, because that a major backwards 
> break) just have a method {{asDocIdSet()}}, that returns an anonymous 
> instance of DocIdSet: bits() returns the FixedBitSet itsself, iterator() 
> returns a new Iterator (like it always did) and the cost/cacheable methods 
> return static values.
> Filters in trunk would need to be changed like that:
> {code:java}
> FixedBitSet bits = ....
> ...
> return bits;
> {code}
> gets:
> {code:java}
> FixedBitSet bits = ....
> ...
> return bits.asDocIdSet();
> {code}
> As this methods returns an anonymous DocIdSet, calling code can no longer 
> rely or check if the implementation behind is a FixedBitSet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to