mccullocht commented on PR #15633:
URL: https://github.com/apache/lucene/pull/15633#issuecomment-3825145153

   I am also mostly interested in the pklookup case, particularly in some 
update-heavy workloads. We tried using bloom filters on our primary key field 
but noticed they were never called 🙃.
   
   I ran luceneutil benches with wikimediumall and `doUpdates=True` for 5 
iterations on each of the baseline and the candidate and averaged the metrics 
that seemed the most throughput looking -- indexing time and plain text GB/s. 
The baseline came out ahead: -1.5% indexing time and +3.5% plain text GB/s. I'm 
not sure if there's a better way of testing this within luceneutil, but I'm 
also thinking about trying to write a JMH benchmark that capture this, it's 
just a bit tricky as this is not a public API.
   
   I've also thought about enabling this more selectively. It could be a 
parameter in `IndexWriterConfig` but I think it's a very deep configuration 
parameter and could be confusing. `Terms` or `TermsEnum` could also have a 
`preferExact()` method or similar that could be overridden for bloom codecs or 
the like to control this behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to