Re: [jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

robert engels Thu, 11 Jan 2007 13:24:33 -0800

I would assume the "incremental" field cache would be very similar tomy "incremental" query filter. I have found this to be the #1performance improvement I've been able to make with Lucene -especially for highly dynamic indexes.


I have attached again to this email the code:



On Jan 11, 2007, at 3:00 PM, Chuck Williams (JIRA) wrote:

[ https://issues.apache.org/jira/browse/LUCENE-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464012 ]
Chuck Williams commented on LUCENE-769:
---------------------------------------
I have this same issue with a constantly changing large index whereusers needs a current view. The frist search after each frequentIndexReader reopen is slow due primarily to the requirement torebuild the FieldCache for sort fields.
I don't believe this patch, or any continuation along these lines,will help my issue. Documents are lage and queries frequentlyreturn large results sets, say 20% of the entire multi-milliondocument index or more. Hundreds of thousands of document()retrievals, even with a fast LOAD_AND_BREAK FieldSelector findingsort fields at the beginning of each Document, is not going to beatFieldCache's single traversal of the postings for the sort fieds.
Another approach I've looked at is Robert Engel's IndexReader.reopen(). I think this direction is more promising. Artem, you mightwant to look at this. At least the version I've seen is notintegrated with FieldCache, but it seems this would be feasible.Segments to the left of the first changed segment maintain theirdoc-ids, so an improved FieldCache could iterate just the postingsin the first changed segment and those to the right. Unlesssomebody else does this first, it's on my list to improveIndexReader.reopen() with this optimization and to make otherenhancements my app needs (e.g., support for ParallelReader -- thecurrent implementation fails in this case).
A specific comment on the new patch: the introduction ofFieldSelectors is too restrictive. The same doc-id may beretrieved using multiple FieldSelectors in different calls toIndexReader.document(). Any implementation of the cache needs tosupport this.
[PATCH] Performance improvement for some cases of sorted search
---------------------------------------------------------------

                Key: LUCENE-769
                URL: https://issues.apache.org/jira/browse/LUCENE-769
            Project: Lucene - Java
         Issue Type: Improvement
   Affects Versions: 2.0.0
           Reporter: Artem Vasiliev
Attachments: DocCachingSorting.patch,DocCachingSorting.patch, StoredFieldSorting.patch
It's a small addition to Lucene that significantly lowers memoryconsumption and improves performance for sorted searches withfrequent index updates and relatively big indexes (>1mln docs)scenario. This solution supports only single-field sortingcurrently (which seem to be quite popular use case). Multiplefields support can be added without much trouble.The solution is this: documents from the sorting set (instead ofgiven field's values from the whole index - current FieldCacheapproach) are cached in a WeakHashMap so the cached items arecandidates for GC. Their fields values are then fetched from thecache and compared while sorting.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-769) [PATCH] Performance improvement for some cases of sorted search

Reply via email to