Along the lines of Lucene-550, what about having a MemoryIndex that accepts
multiple documents, then wrote the index once at the end in the Lucene file
format (so it could be merged) during close.

When adding documents using an IndexWriter, a new segment is created for
each document, and then the segments are periodically merged in memory,
and/or with disk segments. It seems that when constructing an Index or
updating a "lot" of documents in an existing index, the write, read, merge
cycle is inefficient, and if the documents/field information were maintained
in order (TreeMaps) greater efficiency would be realized.

With a memory index, the memory needed during update will increase
dramatically, but this could still be bounded, and a "disk based" index
segment written when too many documents are in the memory index (max
buffered documents).

Does this "sound" like an improvement? Has anyone else tried something like
this?

Reply via email to