[ https://issues.apache.org/jira/browse/LUCENENET-190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Garski updated LUCENENET-190: ------------------------------------- Attachment: cache_Gen2.PNG Gen2 heap utilization over test period (18 hours) > 2.4.0 Performance in TermInfosReader term caching (New implementation of > SimpleLRUCache) > ---------------------------------------------------------------------------------------- > > Key: LUCENENET-190 > URL: https://issues.apache.org/jira/browse/LUCENENET-190 > Project: Lucene.Net > Issue Type: Improvement > Environment: v2.4.0 > Reporter: Digy > Priority: Minor > Attachments: cache_Gen2.PNG, SimpleLRUCache.rar > > > Below is the mail from Michael Garski about the Performance in > TermInfosReader term caching. It would be good to have a faster LRUCache > implementation in Lucene.Net > DIGY > {quote} > Doug did an amazing job of porting 2.4.0, doing it mostly on his own! > Hooray Doug! > We are using the committed version of 2.4.0 in production and I wanted to > share a performance issue we discovered and what we've done to work around > it. From the Java Lucene change log: "LUCENE-1195: Improve term lookup > performance by adding a LRU cache to the TermInfosReader. In performance > experiments the speedup was about 25% on average on mid-size indexes with > ~500,000 documents for queries with 3 terms and about 7% on larger indexes > with ~4.3M documents." > The Java implementation uses a LinkedHashMap within the class > org.apache.lucene.util.cache.SimpleLRUCache, which is very efficient at > maintaining the cache. As there is no equivalent collection in .Net The > current 2.4.0 port uses a combination of a LinkedList to maintain LRU state > and a HashTable to provide lookups. While this implementation works, > maintaining the LRU state via the LinkedList creates a fair amount of > overhead and can result in a significant reduction of performance, most > likely attributed to the LinkedList.Remove method being O(n). As each thread > maintains its own cache of 1024 terms, these overhead in performing the > removal is a drain on performance. > At this time we have disabled the cache in the method > TermInfosReader.TermInfo Get(Term term, bool useCache) by always setting the > useCache parameter to false inside the body of the method. After doing this > we saw performance return back to the 2.3.2 levels. I have not yet had the > opportunity to experiment with other implementations within the > SimpleLRUCache to address the performance issue. One approach that would > might solve the issue is to use the HashedLinkedList<T> class provided in the > C5 collection library [http://www.itu.dk/research/c5/]. > Michael > Michael Garski > Search Architect > MySpace.com > www.myspace.com/michaelgarski <http://%27www.myspace.com/mgarski> > {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.