Michael McCandless wrote on 01/15/2007 01:49 AM: > Chuck, > >> Possibly related, one of the ways I improved concurrency in >> ParallelWriter was to break up IndexWriter.addDocument() into one method >> to invert the document and create a RAMSegment and a second method that >> takes the RAMSegment and merges it into the index. This allows >> inversions to be processed in parallel, while merging is already a >> critical section. (Side thought: I've been wondering how hard it would >> be to make merging not a critical section). I had thought of the method >> to take the RAMSegment and merge it to be the "commit" part of >> addDocument(). > >> Your notion of commit is much better and more flexible, but perhaps you >> could include this inversion/merge separation as well? > > I'm a little confused on what this would mean? Do you mean opening up > separate public methods: one to invert (and get a segment back) and > one to append (and possibly merge) a segment to the index (keeping the > existing addDocument that would then just call these two)? How would > this buy you more concurrency (since the current method indeed only > synchronizes around the merge part)? Oh: would you behind the scenes > take each "single doc" segment and pre-merge them privatelyx, > concurrently, possibly up to many levels, privately, and then finally > add the merged segment into the index? Ie, the beginnings of > "concurrent merge" described above? > > Actually couldn't we do this change today (ie without waiting for > explicit commits)? It seems like a separable change.
Yes, I've already made this change so it is independent, creating invertDocument(), addInvertedDocument() and abortInvertedDocument(). This enables more concurrency in ParallelWriter because there are no synchronization restrictions at all on calling invertDocument(). addInvertedDocument() has a synchronization requirement: it can be called in parallel for each subdocument corresponding to the same document, but not for subdocuments corresponding to different documents as this could break the required parallel subindex doc-id correspondence. Because addDocument() (which is just addInvertedDocument(invertDocument())) contains the call to addInvertedDocument() it has the same synchronization requirement, preventing the extra parallelism in the invertDocument() calls. It seemed to me that this could be related to the your explicit-commits idea since it also breaks up writes into an uncommitted local portion and committed portion. Hope you put your explicit commits idea together soon! Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]