mikemccand commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1830166075
> This is reasonable as the terms index (FST) holds all the terms. +1, nice! > #### Fuzzy/Wildcard/Prefix queries got _much slower_ > This is also expected because currently I used the default implementation provided by `TermsEnum` which does not take advantage of the FST. With an optimized implementation I expect it to at least be on-par and slightly better because the FST holds information about all terms, whereas the current BlockTreeTerms only holds prefixes. OK this makes sense, and it is a (sad) measure of how slow the emulated (on top of `seekCeil`) `.intersect` `TermsEnum` is. Once you have an optimized version it should likely be faster than block tree since it can intersect all suffixes instead of scanning `byte[]` suffixes in the term block and re-testing each. > #### `HighTermTitleSort` and `HighTermMonthSort` got about 4.5% ~ 10% less throughput > I don't quite understand why term lookup could affect sorting on a DV field This is odd. Though, the `HighTermMonthSort` QPS is so crazy high as to not really be trustworthy -- likely BMW is kicking in and saving tons of work. > #### `AndHighLow` got slower > > Am i missing some optimization opportunity for low freq terms? Hmm maybe pulsing? Are we still inlining single-occurrence terms directly into the terms dict with your new terms dict? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org