[jira] Commented: (LUCENENET-190) 2.4.0 Performance in TermInfosReader term caching (New implementation of SimpleLRUCache)

Michael Garski (JIRA) Fri, 14 Aug 2009 16:05:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743484#action_12743484
 ]


Michael Garski commented on LUCENENET-190:
------------------------------------------

Digy,

I only performed the test once for each variation.  The 18 hour test run was 
specifically for looking at the utilization of the Gen2 heap as any cached 
items will end up there.  From my perspecitve I see the results of 11.59 - 
12.13 searches/sec as being equivalent.  They may not be in a true sense, but 
for the purpose of the tests I ran they are very close.

The big O on the SortedList.Remove is what is contained in MSDN : 
http://msdn.microsoft.com/en-us/library/system.collections.sortedlist.remove.aspx

With our indexes being so large we are going to continue with the cache 
disabled, however I do not mean this to be interpreted as disabling the cache 
in Lucene.Net across the board.  Modifying the cache to be more efficient would 
be the way to proceed to maintain parity with Java Lucene, perhaps providing a 
way to disable it, but having it enabled by default.  Either your patch w/o 
locking or a minor modification to the currently committed code to use the 
underlying base.Map.ContainsKey to check for existence in the list would be 
appropriate as at a minimum they keep performance on par with 2.3 for a large 
index.  Performance metrics with smaller indexes would be interesting to see 
and hopefully I'll get a chance to get to that next week.

While the complexity of the operations within C5 may not change based on the 
number of items in the list, I do believe there is a fair amount of overhead in 
them, which would explain the performance being similar to other tests.  I 
don't think including C5 with Lucene would be appropriate, especially given my 
results.  I included that variation just to see how it would perform.  We use 
the C5 collections in our search system for things such as dynamic pooled 
filters, and I was curious to see how it would perform in this case.

Michael

> 2.4.0 Performance in TermInfosReader term caching (New implementation of 
> SimpleLRUCache)
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENENET-190
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-190
>             Project: Lucene.Net
>          Issue Type: Improvement
>         Environment: v2.4.0
>            Reporter: Digy
>            Priority: Minor
>         Attachments: cache_Gen2.PNG, SimpleLRUCache.rar
>
>
> Below is the mail from Michael Garski about the Performance in 
> TermInfosReader term caching. It would be good to have a faster LRUCache 
> implementation in Lucene.Net
> DIGY
> {quote}
> Doug did an amazing job of porting 2.4.0, doing it mostly on his own!  
> Hooray Doug!
> We are using the committed version of 2.4.0 in production and I wanted to 
> share a performance issue we discovered and what we've done to work around 
> it.  From the Java Lucene change log:  "LUCENE-1195: Improve term lookup 
> performance by adding a LRU cache to the TermInfosReader. In performance 
> experiments the speedup was about 25% on average on mid-size indexes with 
> ~500,000 documents for queries with 3 terms and about 7% on larger indexes 
> with ~4.3M documents."
> The Java implementation uses a LinkedHashMap within the class 
> org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at 
> maintaining the cache.  As there is no equivalent collection in .Net The 
> current 2.4.0 port uses a combination of a LinkedList to maintain LRU state 
> and a HashTable to provide lookups.  While this implementation works, 
> maintaining the LRU state via the LinkedList creates a fair amount of 
> overhead and can result in a significant reduction of performance, most 
> likely attributed to the LinkedList.Remove method being O(n).  As each thread 
> maintains its own cache of 1024 terms, these overhead in performing the 
> removal is a drain on performance.
> At this time we have disabled the cache in the method 
> TermInfosReader.TermInfo Get(Term term, bool useCache) by always setting the 
> useCache parameter to false inside the body of the method.  After doing this 
> we saw performance return back to the 2.3.2 levels.  I have not yet had the 
> opportunity to experiment with other implementations within the 
> SimpleLRUCache to address the performance issue.  One approach that would 
> might solve the issue is to use the HashedLinkedList<T> class provided in the 
> C5 collection library [http://www.itu.dk/research/c5/].
> Michael
> Michael Garski
> Search Architect
> MySpace.com
> www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LUCENENET-190) 2.4.0 Performance in TermInfosReader term caching (New implementation of SimpleLRUCache)

Reply via email to