[jira] [Updated] (LUCENE-4688) Reuse TermsEnum in BlockTreeTermsReader

Simon Willnauer (JIRA) Wed, 16 Jan 2013 07:38:22 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Simon Willnauer updated LUCENE-4688:
------------------------------------

    Attachment: LUCENE-4688.patch

here is an initial patch including my small benchmark that shows a pretty 
significant impact of reuse. 

the benchmark indexes 2 Million super small docs and checks for each doc if the 
ID has already been indexed. I use NRT manager to reopen the reader every 
second. 

the results are pretty significant IMO: 
{noformat}
start benchmark
run with reuse
Run took: 24 seconds with reuse terms enum = [true]
run without reuse
Run took: 34 seconds with reuse terms enum = [false]
{noformat}

while all tests pass with that patch I really wanna ask somebody (mike? :) ) 
with more knowledge about the BlockTreeTermsReader to look at this patch!! 

I also run benchmarks with lucene util but didn't see any real gains with this 
change so far.
                
> Reuse TermsEnum in BlockTreeTermsReader
> ---------------------------------------
>
>                 Key: LUCENE-4688
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4688
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 4.0, 4.1
>            Reporter: Simon Willnauer
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4688.patch
>
>
> Opening a TermsEnum comes with a significant cost at this point if done 
> frequently like primary key lookups or if many segments are present. 
> Currently we don't reuse it at all and create a lot of objects even if the 
> enum is just used for a single seekExact (ie. TermQuery). Stressing the 
> Terms#iterator(reuse) call shows significant gains with reuse...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4688) Reuse TermsEnum in BlockTreeTermsReader

Reply via email to