> From: Brian Goetz [mailto:[EMAIL PROTECTED]] > > I like the idea of being able to add fields to a Document after the > Document is indexed. Then, for documents with a long 'body' and short > metadata fields, you could process the body through an InputStream > adapter, which would, as a side effect, store the other fields > somewhere, and then add them. Doug, how hard would this be to support > adding some new fields to an already indexed document?
Before a document can be added to an index Lucene must sort all of the terms in it, and thus it must have all of these terms. It could be changed. Some background: When a document is added, it is written as a segment. Segments are each complete indexes, containing documents numbered from zero. To keep from having to search too many segments, segments are periodically merged. When segments are merged, documents in all but the first are re-numbered. For example, merging two segments each containing three documents numbered 0, 1, and 2 creates a new segment containing documents numbered 0 through 5. If there are deleted documents, then more re-numbering happens as deleted documents are dropped. Segments and index contents are also merged "softly", on the fly, by SegmentsReader and MultiSearcher, which permit searching of multiple segments or entire indexes. These on-the-fly merges also re-number, softly. In order to add partial documents we'd need to change things so that segments can be merged without renumbering. A document could be assigned a number when it is created. A segment could be written containing some of its terms, and another segment could be written containing more. (For merging to be efficient, we'd probably need to require that all segments of a document were added before another document is added.) Then merging (hard or soft) would combine the segments of a document for search. A renumbering merge would still be required to remove deleted document numbers. Lucene uses arrays indexed by document number for a few things, so this is required to keep these arrays from getting too big. It also helps with index compression. Someday when I have the time I can look more closely at how hard this would be to implement. It would certainly require changes to lots of code! Doug -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
