> Breaking up RT patches into baby steps would be great :)
> Actually is the RT branch active (I haven't seen commits going
> in).

>From what we discussed, my impression is that the RT changes
will be substantial and the sequence ids seem to be something
that can be implemented now in trunk, then at least a small
piece of RT would be implemented and tested. A small isolated
improvement.

> The biggest downside of sequence IDs is increase RAM usage
> right?

Yes, however the garbage collection would decrease. We should
make seq id deletes pluggable.

> Then, checking if a doc is deleted becomes an int compare
> instead of a bit lookup, right?

Right it'd be an int compare, I think this'd be ok?

On Tue, Jul 20, 2010 at 10:03 AM, Michael McCandless
<[email protected]> wrote:
> Breaking up RT patches into baby steps would be great :)  Actually is
> the RT branch active (I haven't seen commits going in).
>
> Eg, is per-segment DocWriter separable from the RT changes (seems like
> it should/could be)?
>
> The biggest downside of sequence IDs is increase RAM usage right?  Ie,
> today each deletion takes 1 bit, but with sequence IDs it's 32X bigger
> (an int), I think?  Are there other downsides?
>
> Then, checking if a doc is deleted becomes an int compare instead of a
> bit lookup, right?  And, we don't have to clone the deletions during
> reopen.
>
> So this is an appropriate tradeoff for apps that need to reopen after
> every change to the index.  But for apps reopening less often (eg
> maybe up to 10X per second), this may not be a good tradeoff (ie they
> are willing to spend more time in the reopen if it reduces RAM
> footprint).  Maybe the deletes impl should be pluggable and apps can
> pick...
>
> Mike
>
> On Tue, Jul 20, 2010 at 12:33 PM, Jason Rutherglen
> <[email protected]> wrote:
>> Michael B and I have been discussing the per segment doc writers
>> and RT patches/branch. A small improvement we can add to trunk
>> from this is the sequence IDs for deletes, which would improve
>> the existing NRT system by avoiding the cloning of bit vectors.
>> Implementing segment deleted docs via sequence IDs would
>> additionally provide a path way for the future RT branch merge
>> into trunk. It could be best to break up the RT patches as much
>> as possible as they touch on many parts of the Lucene
>> IndexWriter subsystem.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to