[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Uwe Schindler (JIRA) Mon, 23 Nov 2009 13:12:03 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781621#action_12781621
 ]


Uwe Schindler commented on LUCENE-2075:
---------------------------------------

I changed my benchmark to better show the seek caching effect. For NRQ the 
overall improvement has no neglectible effect.

I chenged the rewrite mode of the NRQ to SCORING_BOOLEAN_QUEY and then just 
rewrote the queries to BQ and measured time. So no TermDocs/Collecting was in 
effect:

trunk: avg number of terms: 68.537; avg seeks=28.838; best time=1.022756 ms; 
worst time=17.036802 ms; avg=1.8388833272 ms
patch: avg number of terms: 68.537; avg seeks=28.838; best time=1.066616 ms; 
worst time=12.80917 ms; avg=1.6932529156 ms

You see the effect of the caching. The code ran 5000 rewrites with each query 
repeated 20 times.

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
>                 Key: LUCENE-2075
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2075
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, 
> LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, 
> LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, 
> LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage.  You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap.  One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary).  You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary.  Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Reply via email to