DO NOT REPLY [Bug 27743] - [PATCH] Added support for segmented field data files and cached directories

bugzilla Thu, 18 Mar 2004 10:31:34 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=27743>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=27743

[PATCH] Added support for segmented field data files and cached directories





------- Additional Comments From [EMAIL PROTECTED]  2004-03-18 18:32 -------
The goal of this patch is to improve performance when all values, in all
documents, of a field are required.  The solution proposed is to split the
stored field data, so that it can be more efficiently read and cached, in effect
optimizing IndexReader.document().

An alternate approach might be to instead index the field.  Then one can use a
TermEnum to enumerate all values of the field, and a TermDocs to enumerate the
documents which have those values.  This is considerably more efficient, as both
the sequence of terms and of documents are compressed, while field data accessed
through IndexReader.document() is not compressed.  All the required data is also
stored contiguously on disk, minimizing seeks and maximizing it's cacheability.

Have you benchmarked your approach versus something like this?  Your change
introduces a fair amount of new code and API complexity.  I prefer to keep
Lucene's API and code as simple and small as possible, so I am reluctant to add
a feature like you propose unless there is no viable alternative.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 27743] - [PATCH] Added support for segmented field data files and cached directories

Reply via email to