[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Michael McCandless (JIRA) Sun, 22 Nov 2009 02:39:09 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781112#action_12781112
 ]


Michael McCandless commented on LUCENE-2075:
--------------------------------------------


bq. in both cases, its slower than trunk, but I assume this is due to flex 
branch not being optimized yet?

The automaton benchmark looks great -- I'll dig into why the flex branch
is slower in both of these cases.

The first case tests old API on top of an old index, which I'm
surprised to see not matching trunk's performance.  The flex changes
are supposed to "optimize" that case by directly using the old (trunk)
code.

The second test tests old API emulated over a flex index, which I'm
also surprised to see is not faster than trunk -- there must be
something silly going on in the API emulation.

I'll dig...

When I tested MTQs (TermRangeQuery, WildcardQuery), using flex API on
flex index, they were reasonably faster, so I'll also try to get
automaton's FilteredTermEnum cutover to the flex API, and test that.


> Share the Term -> TermInfo cache across threads
> -----------------------------------------------
>
>                 Key: LUCENE-2075
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2075
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: ConcurrentLRUCache.java, LUCENE-2075.patch, 
> LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, LUCENE-2075.patch, 
> LUCENE-2075.patch
>
>
> Right now each thread creates its own (thread private) SimpleLRUCache,
> holding up to 1024 terms.
> This is rather wasteful, since if there are a high number of threads
> that come through Lucene, you're multiplying the RAM usage.  You're
> also cutting way back on likelihood of a cache hit (except the known
> multiple times we lookup a term within-query, which uses one thread).
> In NRT search we open new SegmentReaders (on tiny segments) often
> which each thread must then spend CPU/RAM creating & populating.
> Now that we are on 1.5 we can use java.util.concurrent.*, eg
> ConcurrentHashMap.  One simple approach could be a double-barrel LRU
> cache, using 2 maps (primary, secondary).  You check the cache by
> first checking primary; if that's a miss, you check secondary and if
> you get a hit you promote it to primary.  Once primary is full you
> clear secondary and swap them.
> Or... any other suggested approach?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2075) Share the Term -> TermInfo cache across threads

Reply via email to