[GitHub] [lucene-solr] bruno-roustant edited a comment on issue #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

GitBox Wed, 26 Feb 2020 00:45:45 -0800

bruno-roustant edited a comment on issue #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#issuecomment-590780144
 
 
   When I debugged thoroughly to understand what was the limitation of the 
approach we had (to compute the common prefix between two consecutive block 
keys in the FST), I saw that actually for all FuzzyQuery the common prefix 
matched so we entered all blocks.
   I realized that the FuzzyQuery automaton accepts many variations for the 
prefix, and the common prefix was not long enough to allow us to filter 
correctly.
   
   I looked at what VarGapFixedInterval did. It jumped all the time after each 
term to find the next target term accepted by the automaton. And this was 
sufficiently efficient thanks to a vital optimization that compared the target 
term to the immediate following term, to actually not jump most of the time.
   
   So I applied the same idea to compute the next accepted term and jump, but 
now with a first condition based on the number of consecutively rejected terms, 
and by anticipating the comparison of the accepted term with the immediate next 
term. This is the main factor of the improvement. We leverage also 
optimizations that speed up the automaton validation of terms in the block.
   
   For the proposal of the block prefix in the BlockHeader, does that mean that 
we have to open the block to get the prefix? Because the speed for FuzzyQuery 
highly depends on how many blocks we *don't* open.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant edited a comment on issue #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

Reply via email to