Michael McCandless wrote on 01/15/2007 01:49 AM:
> Chuck,
>
>> Possibly related, one of the ways I improved concurrency in
>> ParallelWriter was to break up IndexWriter.addDocument() into one method
>> to invert the document and create a RAMSegment and a second method that
>> takes the RAMSegment and merges it into the index.  This allows
>> inversions to be processed in parallel, while merging is already a
>> critical section.  (Side thought:  I've been wondering how hard it would
>> be to make merging not a critical section).  I had thought of the method
>> to take the RAMSegment and merge it to be the "commit" part of
>> addDocument().
>
>> Your notion of commit is much better and more flexible, but perhaps you
>> could include this inversion/merge separation as well?
>
> I'm a little confused on what this would mean?  Do you mean opening up
> separate public methods: one to invert (and get a segment back) and
> one to append (and possibly merge) a segment to the index (keeping the
> existing addDocument that would then just call these two)?  How would
> this buy you more concurrency (since the current method indeed only
> synchronizes around the merge part)?  Oh: would you behind the scenes
> take each "single doc" segment and pre-merge them privatelyx,
> concurrently, possibly up to many levels, privately, and then finally
> add the merged segment into the index?  Ie, the beginnings of
> "concurrent merge" described above?
>
> Actually couldn't we do this change today (ie without waiting for
> explicit commits)?  It seems like a separable change.

Yes, I've already made this change so it is independent, creating
invertDocument(), addInvertedDocument() and abortInvertedDocument(). 
This enables more concurrency in ParallelWriter because there are no
synchronization restrictions at all on calling invertDocument(). 
addInvertedDocument() has a synchronization requirement:  it can be
called in parallel for each subdocument corresponding to the same
document, but not for subdocuments corresponding to different documents
as this could break the required parallel subindex doc-id
correspondence.  Because addDocument() (which is just
addInvertedDocument(invertDocument())) contains the call to
addInvertedDocument() it has the same synchronization requirement,
preventing the extra parallelism in the invertDocument() calls.

It seemed to me that this could be related to the your explicit-commits
idea since it also breaks up writes into an uncommitted local portion
and committed portion.

Hope you put your explicit commits idea together soon!

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to