[jira] Issue Comment Edited: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Uwe Schindler (JIRA) Mon, 23 Nov 2009 06:14:07 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781410#action_12781410
 ]


Uwe Schindler edited comment on LUCENE-2075 at 11/23/09 2:13 PM:
-----------------------------------------------------------------

I tested with an 5 mio doc index containing trie ints, but it seems that trie 
does not really profit from the seeking cache. With the default precStep of 4 
no difference (max. 16 seeks per query), and with precStep of 1 (max. 64 seeks 
per query) it was even a little bit slower on average (???). The test compares 
also with FieldCacheRangeFilter which is always faster (because no deletes, 
optimized index), also the field cache loading time did not really change 
(linear scan in term enum).

PrecisionStep: 4
trunk:
loading field cache time: 6367.667678 ms
avg number of terms: 68.1
TRIE:       best time=6.323709 ms; worst time=414.367469 ms; 
avg=201.18463369999998 ms; sum=32004735
FIELDCACHE: best time=64.770523 ms; worst time=265.487652 ms; avg=155.5479675 
ms; sum=32004735

patch:
loading field cache time: 6295.055377 ms
avg number of terms: 68.1
TRIE:       best time=5.288102 ms; worst time=415.290771 ms; avg=195.72079685 
ms; sum=32004735
FIELDCACHE: best time=65.511957 ms; worst time=202.482438 ms; avg=138.69083925 
ms; sum=32004735

---

PrecisionStep: 1
trunk:
loading field cache time: 6416.105399 ms
avg number of terms: 19.85
TRIE:       best time=6.51228 ms; worst time=410.624255 ms; avg=192.33796475 
ms; sum=32002505
FIELDCACHE: best time=65.349088 ms; worst time=211.308979 ms; 
avg=143.71657580000002 ms; sum=32002505

patch:
loading field cache time: 6809.792026 ms
avg number of terms: 19.85
TRIE:       best time=6.814832 ms; worst time=436.396525 ms; avg=205.6526038 
ms; sum=32002505
FIELDCACHE: best time=64.939539 ms; worst time=277.474371 ms; avg=142.58939345 
ms; sum=32002505

      was (Author: thetaphi):
    I tested with an 5 mio doc index containing trie ints, but it seems that 
trie does not really profit from the seeking cache. With the default precStep 
of 4 no difference (max. 16 seeks per query), and with precStep of 1 (max. 64 
seeks per query) it was even a little bit slower on average (???). The test 
compares also with FieldCacheRangeFilter which is always slower, also the field 
cache loading time did not really change (linear scan in term enum).

PrecisionStep: 4
trunk:
loading field cache time: 6367.667678 ms
avg number of terms: 68.1
TRIE:       best time=6.323709 ms; worst time=414.367469 ms; 
avg=201.18463369999998 ms; sum=32004735
FIELDCACHE: best time=64.770523 ms; worst time=265.487652 ms; avg=155.5479675 
ms; sum=32004735

patch:
loading field cache time: 6295.055377 ms
avg number of terms: 68.1
TRIE:       best time=5.288102 ms; worst time=415.290771 ms; avg=195.72079685 
ms; sum=32004735
FIELDCACHE: best time=65.511957 ms; worst time=202.482438 ms; avg=138.69083925 
ms; sum=32004735

---

PrecisionStep: 1
trunk:
loading field cache time: 6416.105399 ms
avg number of terms: 19.85
TRIE:       best time=6.51228 ms; worst time=410.624255 ms; avg=192.33796475 
ms; sum=32002505
FIELDCACHE: best time=65.349088 ms; worst time=211.308979 ms; 
avg=143.71657580000002 ms; sum=32002505

patch:
loading field cache time: 6809.792026 ms
avg number of terms: 19.85
TRIE:       best time=6.814832 ms; worst time=436.396525 ms; avg=205.6526038 
ms; sum=32002505
FIELDCACHE: best time=64.939539 ms; worst time=277.474371 ms; avg=142.58939345 
ms; sum=32002505
  
> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
>                 Key: LUCENE-2075
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2075
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, 
> LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, 
> LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage.  You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap.  One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary).  You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary.  Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Reply via email to