Hi Mike, There would be a new sorted list or something to replace the hashtable? Seems like an issue that is not solved.
Jason On Tue, Sep 9, 2008 at 5:29 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > This would just tap into the live hashtable that DocumentsWriter* maintain > for the posting lists... except the docFreq will need to be copied away on > reopen, I think. > > Mike > > Jason Rutherglen wrote: > >> Term dictionary? I'm curious how that would be solved? >> >> On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless >> <[EMAIL PROTECTED]> wrote: >>> >>> Yonik Seeley wrote: >>> >>>>> I think it's quite feasible, but, it'd still have a "reopen" cost in >>>>> that >>>>> any buffered delete by term or query would have to be "materialiazed" >>>>> into >>>>> docIDs on reopen. Though, if this somehow turns out to be a problem, >>>>> in >>>>> the >>>>> future we could do this materializing immediately, instead of >>>>> buffering, >>>>> if >>>>> we already have a reader open. >>>> >>>> Right... it seems like re-using readers internally is something we >>>> could already be doing in IndexWriter. >>> >>> True. >>> >>>>> Flushing is somewhat tricky because any open RAM readers would then >>>>> have >>>>> to >>>>> cutover to the newly flushed segment once the flush completes, so that >>>>> the >>>>> RAM buffer can be recycled for the next segment. >>>> >>>> Re-use of a RAM buffer doesn't seem like such a big deal. >>>> >>>> But, how would you maintain a static view of an index...? >>>> >>>> IndexReader r1 = indexWriter.getCurrentIndex() >>>> indexWriter.addDocument(...) >>>> IndexReader r2 = indexWriter.getCurrentIndex() >>>> >>>> I assume r1 will have a view of the index before the document was >>>> added, and r2 after? >>> >>> Right, getCurrentIndex would return a MultiReader that includes >>> SegmentReader for each segment in the index, plus a "RAMReader" that >>> searches the RAM buffer. That RAMReader is a tiny shell class that would >>> basically just record the max docID it's allowed to go up to (the docID >>> as >>> of when it was opened), and stop enumerating docIDs (eg in the TermDocs) >>> when it hits a docID beyond that limit. >>> >>> For reading stored fields and term vectors, which are now flushed >>> immediately to disk, we need to somehow get an IndexInput from the >>> IndexOutputs that IndexWriter holds open on these files. Or, maybe, just >>> open new IndexInputs? >>> >>>> Another thing that will help is if users could get their hands on the >>>> sub-readers of a multi-segment reader. Right now that is hidden in >>>> MultiSegmentReader and makes updating anything incrementally >>>> difficult. >>> >>> Besides what's handled by MultiSegmentReader.reopen already, what else do >>> you need to incrementally update? >>> >>> Mike >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]