[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674662#action_12674662 ]
Michael McCandless commented on LUCENE-1516: -------------------------------------------- Looks good, Jason. This is big change, and I expect to go through a number of iterations before settling... plus we still need to figure out how the API is exposed. Comments: * All this logic needs to be conditional (this also depends on what API we actually settle on to expose this...): right now you always open a reader whenever IW is created. * We should assume we do not need to support autoCommit=true in this patch (since this will land after 3.0). This simplifies things. * IW.reopenInternalReader only does a clone not a reopen; how does it cover the newly flushed segment? * After a merge commits you don't seem to reopen the reader? This is actually tricky to do right, for realtime search: we somehow need to allow for warming of the newly created (merged) segment, in such a way that we do not block the flushing of further segments and reopen of readers against those new segments. I think what may be best is to subclass IW, and override a newly added "postMerge" method that's invoked on the new segment before the merge is committed into the SegmentInfos. This is cleaner than allowing the change into the SegmentInfos and then having to make a custom deletion policy & track history of each segment. * It seems like reader.reopen() (where reader was obtained with IW.getReader()) doesn't do the right thing? (ie it's looking for the most recent segments_N in the Directory, but it should be looking for it @ IW.segmentInfos). * I think we should decouple "materializing deletes down to docIDs" from "flushing deletes to disk". IW does both as the same operation now (because it doesn't want to hold SR open for a long time), but once we have persistent open SegmentReaders we should separate these. It's not necessary for IW to write new .del files when it materializes deletes. * Instead of having to merge readers, I think we should have a single source to obtain an SR from. This way, when IW needs to materialize deletes, it will grab the same instance of SR for a given segment that the currently open MSR is using. Also, when merging kicks off, it'll grab the SR from the same source (this way deletes in RAM will be correctly merged away). Also, I think we should not use MSR for doing deletions (and still go segment by segment): it's quite a bit slower since every invocation must do the binary search again. * Likewise, you have to fix the commitMergedDeletes to decouple computing the new BitVector from writing the .del file to disk. That method should only create a new BitVector, for the newly merged segment. It must be synchronized to prevent any new deletions against the segments that were just merged. In fact, this is a real danger: after a merge finishes, if one continues to use an older reader to do deletions you get into trouble. * I still don't really like having both the IR and IW able to do deletions, with slightly different semantics. As it stands now, since you can't predict when IW materializes deletes, your reader will suddenly see a bunch of deletes appear. I think it's better if no deletes appear, ever, until you reopen your reader. Maybe we simply prevent deletion through the IR? * We need some serious unit tests here! > Integrate IndexReader with IndexWriter > --------------------------------------- > > Key: LUCENE-1516 > URL: https://issues.apache.org/jira/browse/LUCENE-1516 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, > LUCENE-1516.patch, LUCENE-1516.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > The current problem is an IndexReader and IndexWriter cannot be open > at the same time and perform updates as they both require a write > lock to the index. While methods such as IW.deleteDocuments enables > deleting from IW, methods such as IR.deleteDocument(int doc) and > norms updating are not available from IW. This limits the > capabilities of performing updates to the index dynamically or in > realtime without closing the IW and opening an IR, deleting or > updating norms, flushing, then opening the IW again, a process which > can be detrimental to realtime updates. > This patch will expose an IndexWriter.getReader method that returns > the currently flushed state of the index as a class that implements > IndexReader. The new IR implementation will differ from existing IR > implementations such as MultiSegmentReader in that flushing will > synchronize updates with IW in part by sharing the write lock. All > methods of IR will be usable including reopen and clone. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org