DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=27743>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=27743 [PATCH] Added support for segmented field data files and cached directories ------- Additional Comments From [EMAIL PROTECTED] 2004-03-18 18:32 ------- The goal of this patch is to improve performance when all values, in all documents, of a field are required. The solution proposed is to split the stored field data, so that it can be more efficiently read and cached, in effect optimizing IndexReader.document(). An alternate approach might be to instead index the field. Then one can use a TermEnum to enumerate all values of the field, and a TermDocs to enumerate the documents which have those values. This is considerably more efficient, as both the sequence of terms and of documents are compressed, while field data accessed through IndexReader.document() is not compressed. All the required data is also stored contiguously on disk, minimizing seeks and maximizing it's cacheability. Have you benchmarked your approach versus something like this? Your change introduces a fair amount of new code and API complexity. I prefer to keep Lucene's API and code as simple and small as possible, so I am reluctant to add a feature like you propose unless there is no viable alternative. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]