[ https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741549#action_12741549 ]
Michael Garski edited comment on LUCENENET-190 at 8/10/09 2:02 PM: ------------------------------------------------------------------- Digy, In taking a look at the patch, I don't believe it will solve the performance issue. The original Java implementation of the SimpleLRUCache class uses a LinkedHashMap, and from what I can tell, would perform similar to the currently committed Lucene.Net version which uses a Hashtable to maintain the cached items and a LinkedList to manage the contents via LRU. The Hashtable operations get & set operations are both O(1), however the LinkedList operations are where the performance hit is. Adding an item to the head of the list is an O(1), but removing an item is O(n). I believe that maintaining the LRU state ofthe LinkedHashMap would also be an O[n] - someone please correct me if I am wrong. In the patch you provided, a LinkedList is no longer used but rather a SortedList, however both the Add and Remove methods are O(n), which would more than likely degrade performance even further. One other thing I noticed is that the use of the incrementing long TimeStamp value could lead to collisions in the SortedList, resulting in an incorrect LRU state in the cache. In looking at the collections provided in the .Net Framework, I don't see any off the shelf that provide adequate performance, at least not under heavy query load. The only collection I've seen so far that would offer ideal performance would be the HasedLinkedList<T> in the C5 collections [http://www.itu.dk/research/c5/]. The performance on the InsertFirst, RemoveLast, and Get methods are all O(1), which is pretty hard to beat. Michael was (Author: mgarski): Digy, In taking a look at the patch, I don't believe it will solve the performance issue. The original Java implementation of the SimpleLRUCache class uses a LinkedHashMap, and from what I can tell, would perform similar to the currently committed Lucene.Net version which uses a Hashtable to maintain the cached items and a LinkedList to manage the contents via LRU. The Hashtable operations get & set operations are both O(1), however the LinkedList operations are where the performance hit is. Adding an item to the head of the list is an O(1), but removing an item is O(n). I believe that maintaining the LRU state ofthe LinkedHashMap would also be an O(n) - someone please correct me if I am wrong. In the patch you provided, a LinkedList is no longer used but rather a SortedList, however both the Add and Remove methods are O(n), which would more than likely degrade performance even further. One other thing I noticed is that the use of the incrementing long TimeStamp value could lead to collisions in the SortedList, resulting in an incorrect LRU state in the cache. In looking at the collections provided in the .Net Framework, I don't see any off the shelf that provide adequate performance, at least not under heavy query load. The only collection I've seen so far that would offer ideal performance would be the HasedLinkedList<T> in the C5 collections [http://www.itu.dk/research/c5/]. The performance on the InsertFirst, RemoveLast, and Get methods are all O(1), which is pretty hard to beat. Michael > 2.4.0 Performance in TermInfosReader term caching (New implementation of > SimpleLRUCache) > ---------------------------------------------------------------------------------------- > > Key: LUCENENET-190 > URL: https://issues.apache.org/jira/browse/LUCENENET-190 > Project: Lucene.Net > Issue Type: Improvement > Environment: v2.4.0 > Reporter: Digy > Priority: Minor > Attachments: SimpleLRUCache.rar > > > Below is the mail from Michael Garski about the Performance in > TermInfosReader term caching. It would be good to have a faster LRUCache > implementation in Lucene.Net > DIGY > {quote} > Doug did an amazing job of porting 2.4.0, doing it mostly on his own! > Hooray Doug! > We are using the committed version of 2.4.0 in production and I wanted to > share a performance issue we discovered and what we've done to work around > it. From the Java Lucene change log: "LUCENE-1195: Improve term lookup > performance by adding a LRU cache to the TermInfosReader. In performance > experiments the speedup was about 25% on average on mid-size indexes with > ~500,000 documents for queries with 3 terms and about 7% on larger indexes > with ~4.3M documents." > The Java implementation uses a LinkedHashMap within the class > org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at > maintaining the cache. As there is no equivalent collection in .Net The > current 2.4.0 port uses a combination of a LinkedList to maintain LRU state > and a HashTable to provide lookups. While this implementation works, > maintaining the LRU state via the LinkedList creates a fair amount of > overhead and can result in a significant reduction of performance, most > likely attributed to the LinkedList.Remove method being O(n). As each thread > maintains its own cache of 1024 terms, these overhead in performing the > removal is a drain on performance. > At this time we have disabled the cache in the method > TermInfosReader.TermInfo Get(Term term, bool useCache) by always setting the > useCache parameter to false inside the body of the method. After doing this > we saw performance return back to the 2.3.2 levels. I have not yet had the > opportunity to experiment with other implementations within the > SimpleLRUCache to address the performance issue. One approach that would > might solve the issue is to use the HashedLinkedList<T> class provided in the > C5 collection library [http://www.itu.dk/research/c5/]. > Michael > Michael Garski > Search Architect > MySpace.com > www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski> > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.