Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > Are you referring to the IW.pendingCommit SegmentInfos variable?
No, I'm referring to segmentInfos. (pendingCommit is the "snapshot" of segmentInfos taken when committing...). > When you say "flushed" you are referring to the IW.prepareCommit method? No, I'm referring to "flush"... it writes a new segment but not a new segments_N, does not sync the files, and does not invoke the deletion policy. > I think step #1 is important and should be generally useful outside of > realtime search, however it's unclear how/when calls to IW.deleteDocument > will reflect in IW.getReader? You'd have to flush (to materialize pending deletions inside IW) then reopen the reader, to see any deletions done via the writer. But I think instead realtime search would do deletions via the reader (because if you use IW you're updating deletes through the Directory = too slow). > Interleaving deletes with documents added isn't possible because if the > documents are in the IW ram buffer, they are not necessarily deleted Well, we buffer the delete and then on flush we materialize the delete. So if you add a doc with field X=77, then delete-by-term X:77, then flush, you'll flush a 1 document segment whose only document is marked as deleted. But I think for realtime we don't want to be using IW's deletion at all. We should do all deletes via the IndexReader. In fact if IW has handed out a reader (via getReader()) and that reader (or a reopened derivative) remains open we may have to block deletions via IW. Not sure... somehow IW & IR have to "split" the write lock else we may need to merge deletions somehow. > If this is swapped in later how is the system realtime except perhaps deletes? We have to test performance to measure the net add -> search latency. For many apps this approach may be plenty fast. If your IO system is an SSD it could be extremely fast. Swapping in RAMDir just makes it faster w/o changing the basic approach. > Adding support for multiple transactions at once on IndexWriter outside of > the realtime transactions seems to require a lot of refactoring. Besides the transaction log (for crash recovery), which should fit "above" Lucene nicely, what else is needed for realtime beyond the single-transaction support Lucene already provides? Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org