[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675177#action_12675177 ]
Michael McCandless commented on LUCENE-1516: -------------------------------------------- {quote} > since you can't predict when IW materializes deletes, your reader > will suddenly see a bunch of deletes appear. The reader would need to be reopened to see the deletes. Isn't that expected behavior? {quote} Ahh right, so long as we keep internal (private) clone, materializing the deletes won't affect the external reader. {quote} > Instead of having to merge readers, I think we need a single > source to obtain an SR from I like this however how would IR.clone work? {quote} It should work fine? The single source would only be used internally by IW (for merging, for materializing deletes, for the internal reader). bq. I like having the internal reader separate from the external reader. I think we should keep that separation. {quote} The main reason to expose IR from IW is to allow delete by doc id and norms updates (eventually column stride fields updates). I don't see how we can grab a reader during a merge, and block realtime deletes occurring on the external reader. However it is difficult to rectify deletes to an external SR that's been merged away. It seems like we're getting closer to using a unique long UID for each doc that is carried over between merges. I was going to implement this above LUCENE-1516 however we may want to make UIDs a part of LUCENE-1516 to implement the behavior we're discussing. If the updates to SR are queued, then it seems like the only way to achieve this is a doc UID. This way merges can happen in the background, the IR has a mechanism for mapping it's queue to the newly merged segments when flushed. Hopefully we aren't wreaking havoc with the IndexReader API? {quote} But... do we need delete by docID once we have realtime search? I think the last compelling reason to keep IR's delete by docID was immediacy, but realtime search can give us that, from IW, even when deleting by Term or Query? (Your app can always add that long UID if it doesn't already have something usable). docIDs are free to changing inside IW. I don't see how we can hand out a reader, allow deletes by docID to it, and merge those deletes back in at a later time, unless we track the genealogy of the segments? {quote} The scenario I think we're missing is if there's multiple cloned SRs out there. With the IW checkout an SR model how do we allow cloning? A clone's updates will be placed into a central original SR queue? The queue is drained automatically on a merge or IW.flush? What happens when we want the IR deletes to be searchable without flushing to disk? Do a reopen/clone? {quote} This is why I think all changes must be done through IW if you've opened a reader from it. In fact, with the addition of realtime search to Lucene, if we also add updating norms/column-stride fields to IW, can't we move away from allowing any changes via IR? (Ie deprecate deleteDocuments/setNorms/etc.) {quote} > It's not necessary for IW to write new .del files when it > materializes deletes. Good point, DocumentsWriter.applyDeletes shouldn't be flushing to disk and this sounds like a test case to add to TestIndexWriterReader. {quote} Well, if IW has no persistent reader to hold the deletes, it must keep doing what it does now (flush immediately to disk)? {quote} > IW.reopenInternalReader only does a clone not a reopen; however > does it cover the newly flushed segment? The segmentinfos is obtained from the Writer. In the test case testIndexWriterReopenSegment it looks like using clone reopens the new segments. {quote} Wait, where is this test? Maybe you need to svn add it? And, clone should not be reopening segments...? {quote} > I think it's better if no deletes appear, ever, until you reopen > your reader. Maybe we simply prevent deletion through the IR? Preventing deletion through the IR would seem to defeat the purpose of the patch unless there's some alternative mechanism for deleting by doc id? {quote} See above. {quote} > commitMergedDeletes to decouple computing the new BitVector from > writing the .del file to disk. A hidden method I never noticed. I'll keep it in mind. {quote} It's actually very important. This is how IW allows deletes to materialize to docIDs, while a merge is running -- any newly materialized deletes against the just-merged segments are coalesced and carried over to the newly created segment. Any further deletes must be done against the docIDs in the new segment (which is why I don't see how we can allow deletes by docID to happen against a checked out reader). {quote} > It seems like reader.reopen() (where reader was obtained with > IW.getReader()) doesn't do the right thing? (ie it's looking for the > most recent segments_N in the Directory, but it should be looking for > it @ IW.segmentInfos). Using the reopen method implementation for a Reader with IW does not seem necessary. It seems like it could call clone underneath? {quote} Well, clone should be very different from reopen. It seems like calling reader.reopen() (on reader obtained from writer) should basically do the same thing as calling writer.getReader(). Ie they are nearly synonyms? (Except for small difference in ref counting -- I think writer.getReader() should always incRef, but reopen only incRefs if it returns a new reader). > Integrate IndexReader with IndexWriter > --------------------------------------- > > Key: LUCENE-1516 > URL: https://issues.apache.org/jira/browse/LUCENE-1516 > Project: Lucene - Java > Issue Type: Improvement > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, > LUCENE-1516.patch, LUCENE-1516.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > The current problem is an IndexReader and IndexWriter cannot be open > at the same time and perform updates as they both require a write > lock to the index. While methods such as IW.deleteDocuments enables > deleting from IW, methods such as IR.deleteDocument(int doc) and > norms updating are not available from IW. This limits the > capabilities of performing updates to the index dynamically or in > realtime without closing the IW and opening an IR, deleting or > updating norms, flushing, then opening the IW again, a process which > can be detrimental to realtime updates. > This patch will expose an IndexWriter.getReader method that returns > the currently flushed state of the index as a class that implements > IndexReader. The new IR implementation will differ from existing IR > implementations such as MultiSegmentReader in that flushing will > synchronize updates with IW in part by sharing the write lock. All > methods of IR will be usable including reopen and clone. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org