Hi,
my response was a bit unclear. Before Lucene 4.0 we saved *deletions* in
a bitset (1 = doc deleted), so you were able to use the DocIdSetIterator
provided directly. At this point there was no sparse implementation.
My idea was more about this: "Because we marked *deleted* docs (not live
Good point, I opened an issue to discuss this:
https://github.com/apache/lucene/issues/13084.
Did we actually use a sparse bit set to encode deleted docs before? I don't
recall that.
On Tue, Feb 6, 2024 at 2:42 PM Uwe Schindler wrote:
> Hi,
>
> A SparseBitset impl for DELETES would be fine if t
Hi,
A SparseBitset impl for DELETES would be fine if the model in Lucene
would encode deleted docs (it did that in earlier times). As deletes are
sparse (deletes are in most cases <40%), this would help to make the
iterator cheaper.
Uwe
Am 06.02.2024 um 09:01 schrieb Adrien Grand:
Hey Mich
Hey Michael,
You are right, iterating all deletes with nextClearBit() would run in
O(maxDoc). I am coming from the other direction, where I'm expecting the
number of deletes to be more in the order of 1%-5% of the doc ID space, so
a separate int[] would use lots of heap and probably not help that
Thanks Adrien!
My thinking with a separate iterator was that nextClearBit() is relatively
expensive (O(maxDoc) to traverse everything, I think). The solution I was
imagining would involve an index-time change to output, say, an int[] of
deleted docIDs if the number is sufficiently small (like mayb
Hi Michael,
Indeed, only MatchAllDocsQuery knows how to produce a count when there are
deletes.
Your idea sounds good to me, do you actually need a side car iterator for
deletes, or could you use a nextClearBit() operation on the bit set?
I don't think we can fold it into Weight#count since ther
Hi,
On OpenSearch, we've been taking advantage of the various O(1)
Weight#count() implementations to quickly compute various aggregations
without needing to iterate over all the matching documents (at least when
the top-level query is functionally a match-all at the segment level). Of
course, from