Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

robert engels Wed, 07 Jan 2009 22:18:11 -0800

Then why not always write segment.delXXXX, where XXXX isincremented. Each file may be compressed or uncompressed based onthe number of deletions it contains.

It should not matter in high performance "server mode" cases, sincethe deletion data for the segment should be in memory anyway, so noneed to read all of the files (during merging). They are only therefor crash recovery situations.

If you are using lucene in the "shared networking" mode, thenperformance is probably not a huge concern anyway, so reading themultiple files when opening the segment is the penalty.

Or the policy can be pluggable and the "shared" can use the oldbitset method.


On Jan 8, 2009, at 12:04 AM, Marvin Humphrey wrote:

On Wed, Jan 07, 2009 at 10:36:01PM -0600, robert engels wrote:
Yes, and I don't think the "worst-case" is correct.

When you go to write that segment, you determine that it is a "large"
segment, but has few deletions (one in this case), it will be written
compressed in probably less than 10 bytes (1 byte header, vlong
start, vint length - you only write the ones...)...
If a segment slowly accumulates deletions over time duringdifferent indexingsessions, at some point you will cross the threshold where thedeletions fileneeds to be written out as an uncompressed bit vector. From thenon, adding asingle additional deletion to the segment during any subsequentindexingsession triggers what I'm calling "worst-case" behavior: the wholebit vector
file needs to be rewritten for the sake of a one deletion.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

Reply via email to