[ http://issues.apache.org/jira/browse/LUCENE-502?page=comments#action_12368768 ]
Steven Tamm commented on LUCENE-502: ------------------------------------ If you're using a WildcardTermEnum, this optimization saves a ton. We usually do wildcard searches which retrieve 50-5000 terms. Since each one of these corresponds to a new TermScorer, removing the caching saves a lot. For a query that has 1800 terms, it saves 800K/query, plus it's also quicker by about 15%. Don't double buffer. > TermScorer caches values unnecessarily > -------------------------------------- > > Key: LUCENE-502 > URL: http://issues.apache.org/jira/browse/LUCENE-502 > Project: Lucene - Java > Type: Improvement > Components: Search > Versions: 1.9 > Reporter: Steven Tamm > Attachments: TermScorer.patch > > TermScorer aggressively caches the doc and freq of 32 documents at a time for > each term scored. When querying for a lot of terms, this causes a lot of > garbage to be created that's unnecessary. The SegmentTermDocs from which it > retrieves its information doesn't have any optimizations for bulk loading, > and it's unnecessary. > In addition, it has a SCORE_CACHE, that's of limited benefit. It's caching > the result of a sqrt that should be placed in DefaultSimilarity, and if > you're only scoring a few documents that contain those terms, there's no need > to precalculate the SQRT, especially on modern VMs. > Enclosed is a patch that replaces TermScorer with a version that does not > cache the docs or feqs. In the case of a lot of queries, that saves 196 > bytes/term, the unnecessary disk IO, and extra SQRTs which adds up. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
