[
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496276#comment-14496276
]
Luc Vanlerberghe edited comment on LUCENE-6427 at 4/15/15 2:49 PM:
-------------------------------------------------------------------
I still have some issues with the ensureCapacity methods:
The doc comment states:
{code}
* If the given {@link FixedBitSet} is large enough to hold {@code numBits},
* returns the given bits, otherwise returns a new {@link FixedBitSet} which
* can hold the requested number of bits.
{code}
while actually it checks if it is large enough to hold (numBits+1) bits (I
already changed the doc comment in one of the commits of my pull request to
reflect this).
In lucene/solr the typical usage is:
# {code}
docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID);
docsWithField.set(docID);
{code}
but also:
# {code}
newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length());
newbits.or(otherDocSet.bits);
{code}
(1) only works because the doc comment doesn't correspond to the
implementation. Correct usage would be ... ensureCapacity(docsWithField, docID
+ 1)
(2) will unexpectly grow newBits even when it is exactly the same size.
The implementation is written as if the numBits argument is a "number of bits"
value, but then proceeds to allocate at least an extra long in the backing
array...
There are several options here:
# Only update the doc comment (like I did) so unsuspecting users don't get an
unexpected performance hit when manipulating equal sized bitsets, but then the
name stays awkward.
# Fix the implementation and update all locations in lucene/solr where it is
used (but this may/will affect custom modules without warning)
# Rename the methods and "numBits" argument (ensureIndexAvailable anyone?).
This will break the compilation of custom modules, but it's @lucene.internal
anyway, not a public api.
was (Author: lvl):
I still have some issues with the ensureCapacity methods:
The doc comment states:
{code}
* If the given {@link FixedBitSet} is large enough to hold {@code numBits},
* returns the given bits, otherwise returns a new {@link FixedBitSet} which
* can hold the requested number of bits.
{code}
while actually it checks if it is large enough to hold (numBits+1) bits (I
already changed the doc comment in one of the commits of my pull request to
reflect this).
In lucene/solr the typical usage is:
# {code}
docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID);
docsWithField.set(docID);
{code}
but also:
# {code}
newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length());
newbits.or(otherDocSet.bits);
{code}
(1) only works because the doc comment doesn't correspond to the
implementation. Correct usage would be ... ensureCapacity(docsWithField, docID
+ 1)
(2) will unexpectly grow newBits even when it is exactly the same size.
The implementation is written as if the numBits argument is a "number of bits"
value, but then proceeds to allocate at least an extra long in the backing
array...
There are several options here:
# Only update the doc comment (like I did) so unsuspecting users don't get an
unexpected performance hit when manipulating equal sized bitsets, but then the
name stays awkward.
# Fix the implementation and update all locations in lucene/solr where it is
used (but this may/will affect custom modules without warning)
# Rename the methods and "numBits" argument (ensureIndexAvailable anyone?).
This will break the compilation of custom modules, but it's @lucene.internal
anyway, not a public api.
> BitSet fixes - assert on presence of 'ghost bits' and others
> ------------------------------------------------------------
>
> Key: LUCENE-6427
> URL: https://issues.apache.org/jira/browse/LUCENE-6427
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/other
> Reporter: Luc Vanlerberghe
>
> Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and
> corresponding tests:
> * Some methods rely on the fact that no bits are set after numBits (what I
> call 'ghost' bits here).
> ** cardinality, nextSetBit, intersects and others may yield wrong results
> ** If ghost bits are present, they may become visible after ensureCapacity is
> called.
> ** The tests deliberately create bitsets with ghost bits, but then do not
> detect these failures
> * FixedBitSet.cardinality scans the complete backing array, even if only
> numWords are in use
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]