[ https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444527#comment-17444527 ]
Feng Guo edited comment on LUCENE-10233 at 11/16/21, 1:40 PM: -------------------------------------------------------------- [~jpountz] Sadly, I found that the implementation of {{SparseFixedBitSet}} is much more complicated than I thought, and it seems that the {{or}} operation of {{FixedBitSet}} and {{SparseFixedBitSet}} can not be simplely solved by {{|}} . I need to spend some more time to consider how to implement this algorithm, and I'm not sure if that can be as efficient as before. Considering the new challenges, I made some new [changes|[https://github.com/apache/lucene/pull/438/commits/292e4ed43119832d626506336ed61152f733a431]] in my original approach: I added extra 0 words for the bitset in the original method. And i created a new expert interface to get the bitset regardless of the docBase. Generally, users can simply use the old interface to get a bitset because the content represented by the bitset is consistent with the idSetIterator. I wonder if this apporach can solve your worries? was (Author: gf2121): [~jpountz] Sadly, I found that the implementation of {{SparseFixedBitSet}} is much more complicated than I thought, and it seems that the {{or}} operation of {{FixedBitSet}} and {{SparseFixedBitSet}} can not be simplely solved by {{|}} . I need to spend some more time to consider how to implement this algorithm, and I'm not sure if that can be as efficient as before. Considering the new challenges, I made some new [changes|[https://github.com/apache/lucene/pull/438/commits/292e4ed43119832d626506336ed61152f733a431]] in my original approach: I considered docBase in the original method, adding extra 0 words for them. And i created a new expert interface to get the bitset regardless of the docBase. Generally, users can simply use the old interface to get a bitset because the content represented by the bitset is consistent with the idSetIterator. I wonder if this apporach can solve your worries? > Store docIds as bitset when leafCardinality = 1 to speed up addAll > ------------------------------------------------------------------ > > Key: LUCENE-10233 > URL: https://issues.apache.org/jira/browse/LUCENE-10233 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Feng Guo > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In low cardinality points cases, id blocks will usually store doc ids that > have the same point value, and {{intersect}} will get into {{addAll}} logic. > If we store ids as bitset, and give the IntersectVisitor bulk visiting > ability, we can speed up addAll because we can just execute the 'or' logic > between the result and the block ids. > Optimization will be triggered when the following conditions are met at the > same time: > # leafCardinality = 1 > # max(docId) - min(docId) <= 16 * pointCount (in order to avoid expanding > too much storage) > # no duplicate doc id > I mocked a field that has 10,000,000 docs per value and search it with a 1 > term PointInSetQuery, the build scorer time decreased from 71ms to 8ms. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org