Re: [PR] Random access term dictionary [lucene]

via GitHub Tue, 28 Nov 2023 08:08:57 -0800


mikemccand commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1830166075


   > This is reasonable as the terms index (FST) holds all the terms.
   
   +1, nice!
   
   > #### Fuzzy/Wildcard/Prefix queries got _much slower_
   > This is also expected because currently I used the default implementation 
provided by `TermsEnum` which does not take advantage of the FST. With an 
optimized implementation I expect it to at least be on-par and slightly better 
because the FST holds information about all terms, whereas the current 
BlockTreeTerms only holds prefixes.
   
   OK this makes sense, and it is a (sad) measure of how slow the emulated (on 
top of `seekCeil`) `.intersect` `TermsEnum` is.  Once you have an optimized 
version it should likely be faster than block tree since it can intersect all 
suffixes instead of scanning `byte[]` suffixes in the term block and re-testing 
each.
   
   > #### `HighTermTitleSort` and `HighTermMonthSort` got about 4.5% ~ 10% less 
throughput
   > I don't quite understand why term lookup could affect sorting on a DV field
   
   This is odd.  Though, the `HighTermMonthSort` QPS is so crazy high as to not 
really be trustworthy -- likely BMW is kicking in and saving tons of work.
   
   > #### `AndHighLow` got slower
   >
   > Am i missing some optimization opportunity for low freq terms?
   
   Hmm maybe pulsing?  Are we still inlining single-occurrence terms directly 
into the terms dict with your new terms dict?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Random access term dictionary [lucene]

Reply via email to