[ https://issues.apache.org/jira/browse/LUCENE-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749678#action_12749678 ]
Chuck Williams commented on LUCENE-600: --------------------------------------- A given logical Document must have the same doc-id in each subindex, which is maintained by using a merge policy that guarantees consistency across the subindexes, either merge-by-count or merge-by-size as dictated by the size-dominant subindex. I just read your wiki page and it looks like your MasterMergePolicy is the same for the merge-by-size case, right? We've bee using parallel incremental indexing in production apps now for a long time, along with the efficient update mechanism described in the patent app. The original company I did this work for was acquired by a larger company who now owns the IP. I don't know how they would feel about a contribution of the latest version of ParallelWriter, which works with the current Lucene. I could inquire if you are truly open to it, but it sounds like you may be on your own path to a quite similar thing. Your wiki page says, "when you need to reindex this field you can simply create a new generation of this parallel index and fill it with the new values". That is the rub of the problem, and the one we created an efficient algorithm and implementation for several years ago. ParallelWriter is the easy part. > ParallelWriter companion to ParallelReader > ------------------------------------------ > > Key: LUCENE-600 > URL: https://issues.apache.org/jira/browse/LUCENE-600 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.1 > Reporter: Chuck Williams > Priority: Minor > Attachments: ParallelWriter.patch > > > A new class ParallelWriter is provided that serves as a companion to > ParallelReader. ParallelWriter meets all of the doc-id synchronization > requirements of ParallelReader, subject to: > 1. ParallelWriter.addDocument() is synchronized, which might have an > adverse effect on performance. The writes to the sub-indexes are, however, > done in parallel. > 2. The application must ensure that the ParallelReader is never reopened > inside ParallelWriter.addDocument(), else it might find the sub-indexes out > of sync. > 3. The application must deal with recovery from > ParallelWriter.addDocument() exceptions. Recovery must restore the > synchronization of doc-ids, e.g. by deleting any trailing document(s) in one > sub-index that were not successfully added to all sub-indexes, and then > optimizing all sub-indexes. > A new interface, Writable, is provided to abstract IndexWriter and > ParallelWriter. This is in the same spirit as the existing Searchable and > Fieldable classes. > This implementation uses java 1.5. The patch applies against today's svn > head. All tests pass, including the new TestParallelWriter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org