mikemccand opened a new issue, #12358:
URL: https://github.com/apache/lucene/issues/12358

   ### Description
   
   Context: we (Amazon customer facing product search team, and also AWS) are 
attempting to understand the amazing performance Tantivy (Rust search engine) 
has over Lucene, iterating in [this GitHub 
repo](https://github.com/Tony-X/search-benchmark-game).  That repo is sort of a 
merger of Lucene's benchmarking code 
([luceneutil](https://github.com/mikemccand/luceneutil)), including its tasks 
and `enwiki` corpus, and the [open source Tantivy 
benchmark](https://github.com/quickwit-oss/search-benchmark-game).  Tantivy is 
impressively fast :)
   
   This issue is a spinoff from [this fascinating 
comment](https://github.com/Tony-X/search-benchmark-game/issues/30#issuecomment-1579761787)
 by @fulmicoton, creator and maintainer of 
[Tantivy](https://github.com/quickwit-oss/tantivy).
   
   Tantivy optimizes `count()` for `BooleanQuery` disjunctions much like 
Lucene's `BooleanScorer`, by scoring in a windowed bitset of N docs at once, 
and then pop-counting the set bits in each window.  This is not technically a 
sub-linear implementation: it is still linear, but I suspect with a smaller 
constant factor than the default `count()` fallback Lucene implements.
   
   Perhaps, for all cases where `BooleanQuery` uses the windowed 
`BooleanScorer`, we could also implement this `count()` optimization.
   
   From my read of Lucene's `BooleanWeight.count`, I don't think Lucene has 
this optimization?  Maybe we should port over Tantivy's optimization?  It 
should make disjunctive counting quite a bit faster?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to