I would assume the "incremental" field cache would be very similar to
my "incremental" query filter. I have found this to be the #1
performance improvement I've been able to make with Lucene -
especially for highly dynamic indexes.
I have attached again to this email the code:
On Jan 11, 2007, at 3:00 PM, Chuck Williams (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-769?
page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel#action_12464012 ]
Chuck Williams commented on LUCENE-769:
---------------------------------------
I have this same issue with a constantly changing large index where
users needs a current view. The frist search after each frequent
IndexReader reopen is slow due primarily to the requirement to
rebuild the FieldCache for sort fields.
I don't believe this patch, or any continuation along these lines,
will help my issue. Documents are lage and queries frequently
return large results sets, say 20% of the entire multi-million
document index or more. Hundreds of thousands of document()
retrievals, even with a fast LOAD_AND_BREAK FieldSelector finding
sort fields at the beginning of each Document, is not going to beat
FieldCache's single traversal of the postings for the sort fieds.
Another approach I've looked at is Robert Engel's IndexReader.reopen
(). I think this direction is more promising. Artem, you might
want to look at this. At least the version I've seen is not
integrated with FieldCache, but it seems this would be feasible.
Segments to the left of the first changed segment maintain their
doc-ids, so an improved FieldCache could iterate just the postings
in the first changed segment and those to the right. Unless
somebody else does this first, it's on my list to improve
IndexReader.reopen() with this optimization and to make other
enhancements my app needs (e.g., support for ParallelReader -- the
current implementation fails in this case).
A specific comment on the new patch: the introduction of
FieldSelectors is too restrictive. The same doc-id may be
retrieved using multiple FieldSelectors in different calls to
IndexReader.document(). Any implementation of the cache needs to
support this.
[PATCH] Performance improvement for some cases of sorted search
---------------------------------------------------------------
Key: LUCENE-769
URL: https://issues.apache.org/jira/browse/LUCENE-769
Project: Lucene - Java
Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Artem Vasiliev
Attachments: DocCachingSorting.patch,
DocCachingSorting.patch, StoredFieldSorting.patch
It's a small addition to Lucene that significantly lowers memory
consumption and improves performance for sorted searches with
frequent index updates and relatively big indexes (>1mln docs)
scenario. This solution supports only single-field sorting
currently (which seem to be quite popular use case). Multiple
fields support can be added without much trouble.
The solution is this: documents from the sorting set (instead of
given field's values from the whole index - current FieldCache
approach) are cached in a WeakHashMap so the cached items are
candidates for GC. Their fields values are then fetched from the
cache and compared while sorting.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators: https://issues.apache.org/jira/secure/
Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/
software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]