mccullocht commented on PR #15633: URL: https://github.com/apache/lucene/pull/15633#issuecomment-3825145153
I am also mostly interested in the pklookup case, particularly in some update-heavy workloads. We tried using bloom filters on our primary key field but noticed they were never called 🙃. I ran luceneutil benches with wikimediumall and `doUpdates=True` for 5 iterations on each of the baseline and the candidate and averaged the metrics that seemed the most throughput looking -- indexing time and plain text GB/s. The baseline came out ahead: -1.5% indexing time and +3.5% plain text GB/s. I'm not sure if there's a better way of testing this within luceneutil, but I'm also thinking about trying to write a JMH benchmark that capture this, it's just a bit tricky as this is not a public API. I've also thought about enabling this more selectively. It could be a parameter in `IndexWriterConfig` but I think it's a very deep configuration parameter and could be confusing. `Terms` or `TermsEnum` could also have a `preferExact()` method or similar that could be overridden for bloom codecs or the like to control this behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
