[jira] Updated: (LUCENE-1195) Performance improvement for TermInfosReader

robert engels (JIRA) Wed, 10 Sep 2008 20:13:36 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


robert engels updated LUCENE-1195:
----------------------------------

    Attachment: SafeThreadLocal.java

A "safe" ThreadLocal that can be used for more deterministic memory usage.

Probably a bit slower than the JDK ThreadLocal, due to the synchronization.

Offers a "purge()" method to force the cleanup of stale entries.  Probably most 
useful in code like this:

        SomeLargeObject slo; // maybe a RAMDirectory?
        try {
                slo = new SomeLargeObject(); // or other creation mechanism;
        } catch (OutOfMemoryException e) {
                SafeThreadLocal.purge();
                // now try again
                slo = new SomeLargeObject(); // or other creation mechanism;
        }




> Performance improvement for TermInfosReader
> -------------------------------------------
>
>                 Key: LUCENE-1195
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1195
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 2.4
>
>         Attachments: lucene-1195.patch, lucene-1195.patch, lucene-1195.patch, 
> SafeThreadLocal.java
>
>
> Currently we have a bottleneck for multi-term queries: the dictionary lookup 
> is being done
> twice for each term. The first time in Similarity.idf(), where 
> searcher.docFreq() is called.
> The second time when the posting list is opened (TermDocs or TermPositions).
> The dictionary lookup is not cheap, that's why a significant performance 
> improvement is
> possible here if we avoid the second lookup. An easy way to do this is to add 
> a small LRU 
> cache to TermInfosReader. 
> I ran some performance experiments with an LRU cache size of 20, and an 
> mid-size index of
> 500,000 documents from wikipedia. Here are some test results:
> 50,000 AND queries with 3 terms each:
> old:                  152 secs
> new (with LRU cache): 112 secs (26% faster)
> 50,000 OR queries with 3 terms each:
> old:                  175 secs
> new (with LRU cache): 133 secs (24% faster)
> For bigger indexes this patch will probably have less impact, for smaller 
> once more.
> I will attach a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1195) Performance improvement for TermInfosReader

Reply via email to