Re: Realtime Search

Michael McCandless Fri, 09 Jan 2009 11:15:42 -0800

Jason Rutherglen <[email protected]> wrote:

> Are you referring to the IW.pendingCommit SegmentInfos variable?


No, I'm referring to segmentInfos.  (pendingCommit is the "snapshot"
of segmentInfos taken when committing...).

> When you say "flushed" you are referring to the IW.prepareCommit method?

No, I'm referring to "flush"... it writes a new segment but not a new
segments_N, does not sync the files, and does not invoke the deletion
policy.

> I think step #1 is important and should be generally useful outside of 
> realtime search, however it's unclear how/when calls to IW.deleteDocument 
> will reflect in IW.getReader?

You'd have to flush (to materialize pending deletions inside IW) then
reopen the reader, to see any deletions done via the writer.  But I
think instead realtime search would do deletions via the reader
(because if you use IW you're updating deletes through the Directory =
too slow).

> Interleaving deletes with documents added isn't possible because if the 
> documents are in the IW ram buffer, they are not necessarily deleted

Well, we buffer the delete and then on flush we materialize the
delete.  So if you add a doc with field X=77, then delete-by-term
X:77, then flush, you'll flush a 1 document segment whose only
document is marked as deleted.

But I think for realtime we don't want to be using IW's deletion at
all.  We should do all deletes via the IndexReader.  In fact if IW has
handed out a reader (via getReader()) and that reader (or a reopened
derivative) remains open we may have to block deletions via IW.  Not
sure... somehow IW & IR have to "split" the write lock else we may
need to merge deletions somehow.

> If this is swapped in later how is the system realtime except perhaps deletes?

We have to test performance to measure the net add -> search latency.
For many apps this approach may be plenty fast.  If your IO system is
an SSD it could be extremely fast.  Swapping in RAMDir
just makes it faster w/o changing the basic approach.

> Adding support for multiple transactions at once on IndexWriter outside of 
> the realtime transactions seems to require a lot of refactoring.

Besides the transaction log (for crash recovery), which should fit
"above" Lucene nicely, what else is needed for realtime beyond the
single-transaction support Lucene already provides?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Realtime Search

Reply via email to