[ 
https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743474#action_12743474
 ] 

Digy commented on LUCENENET-190:
--------------------------------

Hi Michael,

Thanks for this detailed testing.

I am going to think about these results more deeply. My first impressions are:

- The complexity of remove for SortedList being O[n] is very strange (or a very 
bad implementation). What would be the 
result if the complexity of "remove" is reduced to O(logN)? (As it should be)
- Another strange thing is that C5 collections having Q[1] for Contains,Insert 
& Remove(to good to be true) performs worse 
than "w/Digy patch & no locking" in "Additional Tests". Are there any other 
parameters effecting the performance than LRU Cahche?

So, what can be the next step for 2.4.0 other than disabling the caching? I 
personally don't like any external dependencies 
(like C5 or SharpZiplib) or to deal with licencing issues. Can we develop/find 
a more better LRUCache implemantation with 
Apache licence?


DIGY





> 2.4.0 Performance in TermInfosReader term caching (New implementation of 
> SimpleLRUCache)
> ----------------------------------------------------------------------------------------
>
>                 Key: LUCENENET-190
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-190
>             Project: Lucene.Net
>          Issue Type: Improvement
>         Environment: v2.4.0
>            Reporter: Digy
>            Priority: Minor
>         Attachments: cache_Gen2.PNG, SimpleLRUCache.rar
>
>
> Below is the mail from Michael Garski about the Performance in 
> TermInfosReader term caching. It would be good to have a faster LRUCache 
> implementation in Lucene.Net
> DIGY
> {quote}
> Doug did an amazing job of porting 2.4.0, doing it mostly on his own!  
> Hooray Doug!
> We are using the committed version of 2.4.0 in production and I wanted to 
> share a performance issue we discovered and what we've done to work around 
> it.  From the Java Lucene change log:  "LUCENE-1195: Improve term lookup 
> performance by adding a LRU cache to the TermInfosReader. In performance 
> experiments the speedup was about 25% on average on mid-size indexes with 
> ~500,000 documents for queries with 3 terms and about 7% on larger indexes 
> with ~4.3M documents."
> The Java implementation uses a LinkedHashMap within the class 
> org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at 
> maintaining the cache.  As there is no equivalent collection in .Net The 
> current 2.4.0 port uses a combination of a LinkedList to maintain LRU state 
> and a HashTable to provide lookups.  While this implementation works, 
> maintaining the LRU state via the LinkedList creates a fair amount of 
> overhead and can result in a significant reduction of performance, most 
> likely attributed to the LinkedList.Remove method being O(n).  As each thread 
> maintains its own cache of 1024 terms, these overhead in performing the 
> removal is a drain on performance.
> At this time we have disabled the cache in the method 
> TermInfosReader.TermInfo Get(Term term, bool useCache) by always setting the 
> useCache parameter to false inside the body of the method.  After doing this 
> we saw performance return back to the 2.3.2 levels.  I have not yet had the 
> opportunity to experiment with other implementations within the 
> SimpleLRUCache to address the performance issue.  One approach that would 
> might solve the issue is to use the HashedLinkedList<T> class provided in the 
> C5 collection library [http://www.itu.dk/research/c5/].
> Michael
> Michael Garski
> Search Architect
> MySpace.com
> www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>
> {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to