Then why not always write segment.delXXXX, where XXXX is incremented. Each file may be compressed or uncompressed based on the number of deletions it contains.

It should not matter in high performance "server mode" cases, since the deletion data for the segment should be in memory anyway, so no need to read all of the files (during merging). They are only there for crash recovery situations.

If you are using lucene in the "shared networking" mode, then performance is probably not a huge concern anyway, so reading the multiple files when opening the segment is the penalty.

Or the policy can be pluggable and the "shared" can use the old bitset method.

On Jan 8, 2009, at 12:04 AM, Marvin Humphrey wrote:

On Wed, Jan 07, 2009 at 10:36:01PM -0600, robert engels wrote:
Yes, and I don't think the "worst-case" is correct.

When you go to write that segment, you determine that it is a "large"
segment, but has few deletions (one in this case), it will be written
compressed in probably less than 10 bytes (1 byte header, vlong
start, vint length - you only write the ones...)...

If a segment slowly accumulates deletions over time during different indexing sessions, at some point you will cross the threshold where the deletions file needs to be written out as an uncompressed bit vector. From then on, adding a single additional deletion to the segment during any subsequent indexing session triggers what I'm calling "worst-case" behavior: the whole bit vector
file needs to be rewritten for the sake of a one deletion.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to