I like the idea Paul.

As far as how it should be implemented, perhaps a count of docs in
memory should be kept.  It doesn't seem necessary to traverse all of
the segments on every add (it's a linear operation, and will only
result in a merge every "minMergeDocs" or "maxBufferedDocs").

-Yonik

On 5/16/05, Paul Smith <[EMAIL PROTECTED]> wrote:
> In summary, I still firmly believe that the IndexWriter.maybeMergeSegments()
> is chewing a lot more CPU than would be ideal.  So I ran a simple test.  I
> ran the same test I've done before, using mergeFactor(1000)
> maxBufferedDocs(10000), useCompondFile(false), indexing 5 fields (user
> first/lastname/email address)
> 
> As a baseline using the latest SVN source code, I'm getting an indexing rate
> of between 490-515 items/second of a number of runs.
> 
> By applying the attached simple patch to IndexWriter, I'm getting between
> 945-970 of a number of test runs.  That's a significant speed up.  All the
> patch is doing is deferring the call to maybeMergeSegments so it only does
> it every 2000 iterations (2000 is totally arbitrary on my part).
> 
> I've verified with Luke that the index generated contains the same #
> documents, and same # terms, but I have not had a chance to properly setup
> my local environment to run the test cases.  
> 
> Obviously the attached patch is a dirty hack of the highest order. In my
> case I'm re-indexing from scratch every time, so there may be a reason why
> we shouldn't be doing this sort of deferring of method calls.  Perhaps the
> source code is optimized around incremental/batch updates to _existing_
> indexes, but creating a new index, but with a penalty of creating a new
> index performs slower than one would like.
> 
> Perhaps IndexWriter could benefit from another setting that lets one
> configure how often to call maybeMergeSegments()?  That could of course
> confuse more people than it helps.
> 
> I would really appreciate anyones thoughts on this, I'll be very happy to be
> proven wrong because it will just help me understand more of Lucene.  I
> would hope that speeding up indexing would benefit everyone?  Particularly
> the large scale sites out there.
> 
> cheers,
> 
> Paul Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to