On Wed, Jan 07, 2009 at 09:28:40PM -0600, robert engels wrote: > Why not just write the first byte as 0 for a bitsit, and 1 for a > sparse bit set (compressed), and make the determination when writing > based on the segment size and/or number of set bits.
Are you offering that as a solution to the problem I described here? > >When you make deletes with the BitSet model, you have to rewrite > >files that scale with segment size, regardless of how few deletions > >you make. Deletion of a single document in a large segment may > >necessitate writing out a substantial bit vector file. > > > >In contrast, i/o throughput for writing out a tombstone file scales > >with the number of tombstones. Worst-case i/o costs don't improve under such a regime. You could still end up writing a large, uncompressed bit vector file to accommodate a single deletion. I suppose that has to be weighed against the search-time costs of interleaving the tombstone streams. We can either pay the interleaving penalty at index-time or search-time. It's annoying to write out a 1 MB uncompressed bit vector file for a single deleted doc against an 8-million doc segment, but if there are enough deletions to justify an uncompressed file, iterating through them via merged-on-the-fly tombstone streams would be annoying too. Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org