[ https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882295#comment-16882295 ]
ASF subversion and git services commented on LUCENE-8875: --------------------------------------------------------- Commit 7339eb272c30e993e0a8e73154fdfca8ef9879e4 in lucene-solr's branch refs/heads/branch_8x from Atri Sharma [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7339eb2 ] LUCENE-8875: Introduce Optimized Collector For Large Number Of Hits (#754) This commit introduces a new collector which is optimized for cases when the number of hits is large and/or the actual hits collected are sparse in comparison to the number of hits requested. > Should TopScoreDocCollector Always Populate Sentinel Values? > ------------------------------------------------------------ > > Key: LUCENE-8875 > URL: https://issues.apache.org/jira/browse/LUCENE-8875 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Atri Sharma > Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > TopScoreDocCollector always initializes HitQueue as the PQ implementation, > and instruct HitQueue to populate with sentinels. While this is a great > safety mechanism, for very large datasets where the query's selectivity is > high, the sentinel population can be redundant and can become a large enough > bottleneck in itself. Does it make sense to introduce a new parameter in > TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and > does not populate sentinels? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org