> On disk the deletes get bits again and also for already committed segments.
I was thinking we'd use sequence ids for the on disk segments, as they were deleted from, and yes, that'd imply that the sequence id deletes would be converted back to BVs when written to disk. That's probably not a big deal. On Tue, Jul 20, 2010 at 10:09 AM, Uwe Schindler <[email protected]> wrote: >> The biggest downside of sequence IDs is increase RAM usage right? Ie, > today >> each deletion takes 1 bit, but with sequence IDs it's 32X bigger (an int), > I >> think? Are there other downsides? > > It only takes this much space for in-ram deletes of in-ram segments. On disk > the deletes get bits again and also for already committed segments. That is > what Michael told me in Berlin. > >> Then, checking if a doc is deleted becomes an int compare instead of a bit >> lookup, right? And, we don't have to clone the deletions during reopen. >> >> So this is an appropriate tradeoff for apps that need to reopen after > every >> change to the index. But for apps reopening less often (eg maybe up to > 10X >> per second), this may not be a good tradeoff (ie they are willing to spend >> more time in the reopen if it reduces RAM footprint). Maybe the deletes >> impl should be pluggable and apps can pick... >> >> Mike >> >> On Tue, Jul 20, 2010 at 12:33 PM, Jason Rutherglen >> <[email protected]> wrote: >> > Michael B and I have been discussing the per segment doc writers and >> > RT patches/branch. A small improvement we can add to trunk from this >> > is the sequence IDs for deletes, which would improve the existing NRT >> > system by avoiding the cloning of bit vectors. >> > Implementing segment deleted docs via sequence IDs would additionally >> > provide a path way for the future RT branch merge into trunk. It could >> > be best to break up the RT patches as much as possible as they touch >> > on many parts of the Lucene IndexWriter subsystem. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: [email protected] For >> > additional commands, e-mail: [email protected] >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] For additional >> commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
