bruno-roustant edited a comment on issue #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#issuecomment-590780144 When I debugged thoroughly to understand what was the limitation of the approach we had (to compute the common prefix between two consecutive block keys in the FST), I saw that actually for all FuzzyQuery the common prefix matched so we entered all blocks. I realized that the FuzzyQuery automaton accepts many variations for the prefix, and the common prefix was not long enough to allow us to filter correctly. I looked at what VarGapFixedInterval did. It jumped all the time after each term to find the next target term accepted by the automaton. And this was sufficiently efficient thanks to a vital optimization that compared the target term to the immediate following term, to actually not jump most of the time. So I applied the same idea to compute the next accepted term and jump, but now with a first condition based on the number of consecutively rejected terms, and by anticipating the comparison of the accepted term with the immediate next term. This is the main factor of the improvement. We leverage also optimizations that speed up the automaton validation of terms in the block. For the proposal of the block prefix in the BlockHeader, does that mean that we have to open the block to get the prefix? Because the speed for FuzzyQuery highly depends on how many blocks we *don't* open.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org