Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader

Michael McCandless Wed, 25 Jun 2008 03:30:32 -0700


Jason Rutherglen wrote:

One of the bottlenecks I have noticed testing Ocean realtime searchis the delete process which involves writing several files for eachpossibly single delete of a document in SegmentReader. The best wayto handle the deletes is too simply keep them in memory withoutflushing them to disk, saving on writing out an entire BitVector perdelete. The deletes are saved in the transaction log which is bereplayed on recovery.
I am not sure of the best way to approach this, perhaps it iscreating a custom class that inherits from SegmentReader. It couldreuse the existing reopen and also provide a way to set thedeletedDocs BitVector. Also it would be able to reuse FieldsReaderby providing locking around FieldsReader for all SegmentReaders ofthe segment to use. Otherwise in the current architecture each newSegmentReader opens a new FieldsReader which is non-optimal. Thedeletes would be saved to disk but instead of per delete,periodically like a checkpoint.

Or ... maybe you could do the deletes through IndexWriter (somehow, ifwe can get docIDs properly) and then SegmentReaders could somehow tapinto the buffered deleted docIDs that IndexWriter already maintains.IndexWriter is already doing this buffering, flush/commit anyway.

We've also discussed at one point creating an IndexReader impl thatsearches the RAM buffer that DocumentsWriter writes to when addingdocuments. I think it's easier than it sounds, on first glance,because DocumentsWriter is in fact writing the postings in nearly thesame format as is used when the segment is flushed.

So if we had this IndexReader impl, plus extended SegmentReader so itcould tap into pending deletes buffered in IndexWriter, you could getrealtime search without having to use Directory as an intermediary.Though, it is clearly quite a bit more work :)


Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SegmentReader with custom setting of deletedDocs, single reusable FieldsReader

Reply via email to