[ https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Feng Guo updated LUCENE-10233: ------------------------------ Description: In low cardinality points cases, id blocks will usually store doc ids that have the same point value, and intersect will get into addAll logic. If we store ids as bitset when the leafCadinality = 1, and give the IntersectVisitor bulk visiting ability (something like visit(DocIdSetIterator iterator), we can speed up addAll because we can just execute the 'or' logic between the result and the block ids. Concerns: 1. Bitset could occupy more disk space.(Maybe we can force this optimization only works when block's (max-min) <= n * count?) 2. MergeReader will become slower because it needs to iterate docIds one by one. was: In low cardinality points cases, id blocks will usually store doc ids that have the same point value, and intersect will get into addAll logic. If we store ids as bitset when the leafCadinality = 1, and give the IntersectVisitor bulk visiting ability (something like visit(DocIdSetIterator iterator), we can speed up addAll because we can just execute the 'or' logic between the result and the block ids. Concerns: 1. Bitset could occupy more disk space.(Maybe we can force this optimization only works when block's (max-min) * n <= count?) 2. MergeReader will become slower because it needs to iterate docIds one by one. > Store docIds as bitset when leafCardinality = 1 to speed up addAll > ------------------------------------------------------------------ > > Key: LUCENE-10233 > URL: https://issues.apache.org/jira/browse/LUCENE-10233 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Feng Guo > Priority: Major > > In low cardinality points cases, id blocks will usually store doc ids that > have the same point value, and intersect will get into addAll logic. If we > store ids as bitset when the leafCadinality = 1, and give the > IntersectVisitor bulk visiting ability (something like visit(DocIdSetIterator > iterator), we can speed up addAll because we can just execute the 'or' logic > between the result and the block ids. > Concerns: > 1. Bitset could occupy more disk space.(Maybe we can force this optimization > only works when block's (max-min) <= n * count?) > 2. MergeReader will become slower because it needs to iterate docIds one by > one. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org