> The biggest downside of sequence IDs is increase RAM usage right? Ie, today > each deletion takes 1 bit, but with sequence IDs it's 32X bigger (an int), I > think? Are there other downsides?
It only takes this much space for in-ram deletes of in-ram segments. On disk the deletes get bits again and also for already committed segments. That is what Michael told me in Berlin. > Then, checking if a doc is deleted becomes an int compare instead of a bit > lookup, right? And, we don't have to clone the deletions during reopen. > > So this is an appropriate tradeoff for apps that need to reopen after every > change to the index. But for apps reopening less often (eg maybe up to 10X > per second), this may not be a good tradeoff (ie they are willing to spend > more time in the reopen if it reduces RAM footprint). Maybe the deletes > impl should be pluggable and apps can pick... > > Mike > > On Tue, Jul 20, 2010 at 12:33 PM, Jason Rutherglen > <[email protected]> wrote: > > Michael B and I have been discussing the per segment doc writers and > > RT patches/branch. A small improvement we can add to trunk from this > > is the sequence IDs for deletes, which would improve the existing NRT > > system by avoiding the cloning of bit vectors. > > Implementing segment deleted docs via sequence IDs would additionally > > provide a path way for the future RT branch merge into trunk. It could > > be best to break up the RT patches as much as possible as they touch > > on many parts of the Lucene IndexWriter subsystem. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] For > > additional commands, e-mail: [email protected] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For additional > commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
