[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Uwe Schindler (JIRA) Mon, 23 Nov 2009 12:14:05 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781594#action_12781594
 ]


Uwe Schindler commented on LUCENE-2075:
---------------------------------------

bq. I wonder if your test is getting any cache hits at all - if you do random 
ranges, and never repeat queries, then likely your hit rate is quite low?

I am quite sure that also Robert's test is random (as he explained).

I fixed the test to only test few queries and repeat them quite often. For 
precStep=4 and long values, I got about 28 seeks per query, but there was no 
speed improvement. Maybe 28 seeks / query is too less for an effect. The number 
of terms seen per query was 70, so about 2.5 terms/seek which is typical for 
precStep=4 with this index value density (5 Mio random number in the range 
2^-63..2^63). It is also important, that the random ranges hit many documents 
(in avg 1/3 of all docs), so most time in my opinion is used in collecting the 
results. Maybe I should try shorter and limited ranges.

Robert: How many term enum seeks did your queries produce?

Currently I am indexing a 100 Mio docs, precStep=1, long values index (64 terms 
per doc). Let's see what happens here.

If you deprecate SimpleLRUCache, you can also deprecate the MapCache abstract 
super class. But I wouldn't like to deprecate these classes, as I for myself 
use them in my own code for e.g. caching queries etc. And even if you deprecate 
the Map, why remove the tests, they should stay alive until the class is 
removed?

Uwe

> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
>                 Key: LUCENE-2075
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2075
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, 
> LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, 
> LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, 
> LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage.  You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap.  One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary).  You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary.  Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Reply via email to