[
https://issues.apache.org/jira/browse/LUCENE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer updated LUCENE-4688:
------------------------------------
Attachment: LUCENE-4688.patch
here is an initial patch including my small benchmark that shows a pretty
significant impact of reuse.
the benchmark indexes 2 Million super small docs and checks for each doc if the
ID has already been indexed. I use NRT manager to reopen the reader every
second.
the results are pretty significant IMO:
{noformat}
start benchmark
run with reuse
Run took: 24 seconds with reuse terms enum = [true]
run without reuse
Run took: 34 seconds with reuse terms enum = [false]
{noformat}
while all tests pass with that patch I really wanna ask somebody (mike? :) )
with more knowledge about the BlockTreeTermsReader to look at this patch!!
I also run benchmarks with lucene util but didn't see any real gains with this
change so far.
> Reuse TermsEnum in BlockTreeTermsReader
> ---------------------------------------
>
> Key: LUCENE-4688
> URL: https://issues.apache.org/jira/browse/LUCENE-4688
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: 4.0, 4.1
> Reporter: Simon Willnauer
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4688.patch
>
>
> Opening a TermsEnum comes with a significant cost at this point if done
> frequently like primary key lookups or if many segments are present.
> Currently we don't reuse it at all and create a lot of objects even if the
> enum is just used for a single seekExact (ie. TermQuery). Stressing the
> Terms#iterator(reuse) call shows significant gains with reuse...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]