Hi All, I'd like to open up the API to mergeSegments() in IndexWriter and am wondering if there are potential problems with this.
I use ParallelReader and ParallelWriter (in jira) extensively as these provide the basis for fast bulk updates of small metadata fields. ParallelReader requires that the subindexes be strictly synchronized by matching doc ids. The thorniest problem arises when writing a new document (with ParallelWriter) generates an exception in some of the subindexes but not others, as this leaves the subindexes out of sync. I have recovery for this now that works by deleting the successfully added subdocuments that are parallel to any unsuccessful subdocument and then optimizing to expunge the unsuccessful doc-id from those segments where it had been added. Optimization is prohibitively expensive for large indexes, and unnecessary for this recovery. A much better solution is to have an API in IndexWriter to expunge a given set of deleted doc ids. This could merge only enough recent segments to fully encompass the specified docs, which in this case is not much since they will be recently added. The result should be orders of magnitude performance improvement to the recovery. I'm planning to make this change and submit a patch for it unless I've missed something that somebody can point out. At the same time, I'll update the ParallelWriter submission as there are a number of bug fixes plus a substantial general (non-recovery-case) performance improvement I've just identified and am about to implement. Thanks for any thoughts. suggestions, or problems you can point out. Chuck --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]