Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in TermInfosReader term caching (New implementation of SimpleLRUCache)

x...@mail.ru Tue, 18 Aug 2009 06:51:47 -0700

SimpleLRUCache_LUCENENET_190 uses SortedList<long, object> collection.

Performance of SortedList (see http://msdn.microsoft.com/en-us/library/ms132339.aspx):1) Add method is an O(n) operation for unsorted data. It is an O(log n) operation if the new element is added at the end of the list.If insertion causes a resize, the operation is O(n).

2) Remove method method is an O(n) operation

3) RemoveAt method is an O(n) operation4) Keys property is an O(1) operation


Why not to use SortedDictionary<>? It has better performance for Remove and Add:

1) Add method is an O(log n) operation

2) Remove method is an O(log n) operation

3) Keys property is an O(1) operation


--
Iliev Andrei


Digy (JIRA) :

     [ 
https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy updated LUCENENET-190:
---------------------------

    Attachment: SimpleLRUCache.rar

A slightly faster implementation + a test case for SimpleLRUCache.

DIGY

2.4.0 Performance in TermInfosReader term caching (New implementation of 
SimpleLRUCache)
----------------------------------------------------------------------------------------

                Key: LUCENENET-190
                URL: https://issues.apache.org/jira/browse/LUCENENET-190
            Project: Lucene.Net
         Issue Type: Improvement
        Environment: v2.4.0
           Reporter: Digy
           Priority: Minor
        Attachments: cache_Gen2.PNG, SimpleLRUCache.rar, TermInfosReader.rar


Below is the mail from Michael Garski about the Performance in TermInfosReader 
term caching. It would be good to have a faster LRUCache implementation in 
Lucene.Net
DIGY
{quote}

Doug did an amazing job of porting 2.4.0, doing it mostly on his own!Hooray Doug!

We are using the committed version of 2.4.0 in production and I wanted to share a 
performance issue we discovered and what we've done to work around it.  From the Java 
Lucene change log:  "LUCENE-1195: Improve term lookup performance by adding a LRU 
cache to the TermInfosReader. In performance experiments the speedup was about 25% on 
average on mid-size indexes with ~500,000 documents for queries with 3 terms and about 7% 
on larger indexes with ~4.3M documents."
The Java implementation uses a LinkedHashMap within the class 
org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at 
maintaining the cache.  As there is no equivalent collection in .Net The 
current 2.4.0 port uses a combination of a LinkedList to maintain LRU state and 
a HashTable to provide lookups.  While this implementation works, maintaining 
the LRU state via the LinkedList creates a fair amount of overhead and can 
result in a significant reduction of performance, most likely attributed to the 
LinkedList.Remove method being O(n).  As each thread maintains its own cache of 
1024 terms, these overhead in performing the removal is a drain on performance.
At this time we have disabled the cache in the method TermInfosReader.TermInfo 
Get(Term term, bool useCache) by always setting the useCache parameter to false 
inside the body of the method.  After doing this we saw performance return back to 
the 2.3.2 levels.  I have not yet had the opportunity to experiment with other 
implementations within the SimpleLRUCache to address the performance issue.  One 
approach that would might solve the issue is to use the HashedLinkedList<T> 
class provided in the C5 collection library [http://www.itu.dk/research/c5/].
Michael
Michael Garski
Search Architect
MySpace.com
www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski>
{quote}

Re: [jira] Updated: (LUCENENET-190) 2.4.0 Performance in TermInfosReader term caching (New implementation of SimpleLRUCache)

Reply via email to