[
https://issues.apache.org/jira/browse/LUCENE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527666#comment-17527666
]
Bruno Roustant commented on LUCENE-8836:
----------------------------------------
Thanks [~jpountz] for this simplified improvement!
I agree to mark this issue as resolved.
> Optimize DocValues TermsDict to continue scanning from the last position when
> possible
> --------------------------------------------------------------------------------------
>
> Key: LUCENE-8836
> URL: https://issues.apache.org/jira/browse/LUCENE-8836
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Bruno Roustant
> Priority: Major
> Labels: docValues, optimization
> Fix For: 9.2
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> Lucene80DocValuesProducer.TermsDict is used to lookup for either a term or a
> term ordinal.
> Currently it does not have the optimization the FSTEnum has: to be able to
> continue a sequential scan from where the last lookup was in the IndexInput.
> For sparse lookups (when searching only a few terms or ordinal) it is not an
> issue. But for multiple lookups in a row this optimization could save
> re-scanning all the terms from the block start (since they are delat encoded).
> This patch proposes the optimization.
> To estimate the gain, we ran 3 Lucene tests while counting the seeks and the
> term reads in the IndexInput, with and without the optimization:
> TestLucene70DocValuesFormat - the optimization saves 24% seeks and 15% term
> reads.
> TestDocValuesQueries - the optimization adds 0.7% seeks and 0.003% term reads.
> TestDocValuesRewriteMethod.testRegexps - the optimization saves 71% seeks and
> 82% term reads.
> In some cases, when scanning many terms in lexicographical order, the
> optimization saves a lot. In some case, when only looking for some sparse
> terms, the optimization does not bring improvement, but does not penalize
> neither. It seems to be worth to always have it.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]