I've only been loosely following this...
Do you think it is possible to separate the stored/term vector
handling into a separate patch against the current trunk? This seems
like a quick win and I know it has been speculated about before.
On Mar 23, 2007, at 12:00 PM, Michael McCandless wrote:
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
On 3/22/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
Merging is costly because you read all data in then write all data
out, so, you want to minimize for byte of data in the index in the
index how many times it will be "serviced" (read in, written out) as
part of a merge.
Avoiding the re-writing of stored fields might be nice:
http://www.nabble.com/Re%3A--jira--Commented%3A-%28LUCENE-565%29-
Supporting-deleteDocuments-in-IndexWriter-%28Code-and-Performance-
Results-Provided%29-p6177280.html
That's exactly the approach I'm taking in LUCENE-843: stored fields
and term
vectors are immediately written to disk. Only frq, prx and tis use up
memory. This greatly extends how many docs you can buffer before
having to flush (assuming your docs have stored fields and term
vectors).
When memory is full, I either flush a segment to disk (when writer is
in autoCommit=true mode), else I flush the data to tmp files which are
finally merged into a segment when the writer is closed. This merging
is less costly because the bytes in/out are just frq, prx and tis, so
this improves performance of autoCommit=false mode vs autoCommit=true
mode.
But, this is only for the segment created from buffered docs (ie the
segment created by a "flush"). Subsequent merges still must copy
bytes in/out and in LUCENE-843 I haven't changed anything about how
segments are merged.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]