Re: row tombstones as a separate sstable citizen

Carl Mueller Wed, 14 Feb 2018 09:46:24 -0800

So is this at least a decent candidate for a feature request ticket?


On Tue, Feb 13, 2018 at 8:09 PM, Carl Mueller <[email protected]>
wrote:

> I'm particularly interested in getting the tombstones to "promote" up the
> levels of LCS more quickly. Currently they get attached at the low level
> and don't propagate up to higher levels until enough activity at a lower
> level promotes the data. Meanwhile, LCS means compactions can occur in
> parallel at each level. So row tombstones in their own sstable could be up
> promoted the LCS levels preferentially before normal processes would move
> them up.
>
> So if the delete-only sstables could move up more quickly, the compaction
> at the levels would happen more quickly.
>
> The threshold stuff is nice if I read 7019 correctly, but what is the %
> there? % of rows? % of columns? or % of the size of the sstable? Row
> tombstones are pretty compact being just the rowkey and the tombstone
> marker. So if 7019 is triggered at 10% of the sstable size, even a crapton
> of tombstones deleting practially the entire database would only be a small
> % size of the sstable.
>
> Since the row tombstones are so compact, that's why I think they are good
> candidates for special handling.
>
> On Tue, Feb 13, 2018 at 5:22 PM, J. D. Jordan <[email protected]>
> wrote:
>
>> Have you taken a look at the new stuff introduced by
>> https://issues.apache.org/jira/browse/CASSANDRA-7019 ?  I think it may
>> go a ways to reducing the need for something complicated like this.
>> Though it is an interesting idea as special handling for bulk deletes.
>> If they were truly just sstables that only contained deletes the logic from
>> 7109 would probably go a long ways. Though if you are bulk inserting
>> deletes that is what you would end up with, so maybe it already works.
>>
>> -Jeremiah
>>
>> > On Feb 13, 2018, at 6:04 PM, Jeff Jirsa <[email protected]> wrote:
>> >
>> > On Tue, Feb 13, 2018 at 2:38 PM, Carl Mueller <
>> [email protected]>
>> > wrote:
>> >
>> >> In process of doing my second major data purge from a cassandra system.
>> >>
>> >> Almost all of my purging is done via row tombstones. While performing
>> this
>> >> the second time while trying to cajole compaction to occur (in 2.1.x,
>> >> LevelledCompaction) to goddamn actually compact the data, I've been
>> >> thinking as to why there isn't a separate set of sstable infrastructure
>> >> setup for row deletion tombstones.
>> >>
>> >> I'm imagining that row tombstones are written to separate sstables than
>> >> mainline data updates/appends and range/column tombstones.
>> >>
>> >> By writing them to separate sstables, the compaction systems can
>> >> preferentially merge / process them when compacting sstables.
>> >>
>> >> This would create an additional sstable for lookup in the bloom
>> filters,
>> >> granted. I had visions of short circuiting the lookups to other
>> sstables if
>> >> a row tombstone was present in one of the special row tombstone
>> sstables.
>> >>
>> >>
>> > All of the above sounds really interesting to me, but I suspect it's a
>> LOT
>> > of work to make it happen correctly.
>> >
>> > You'd almost end up with 2 sets of logs for the LSM - a tombstone
>> > log/generation, and a data log/generation, and the tombstone logs would
>> be
>> > read-only inputs to data compactions.
>> >
>> >
>> >> But that would only be possible if there was the notion of a "super row
>> >> tombstone" that permanently deleted a rowkey and all future writes
>> would be
>> >> invalidated. Kind of like how a tombstone with a mistakenly huge
>> timestamp
>> >> becomes a sneaky permanent tombstone, but intended. There could be a
>> >> special operation / statement to undo this permanent tombstone, and
>> since
>> >> the row tombstones would be in their own dedicated sstables, they could
>> >> process and compact more quickly, with prioritization by the compactor.
>> >>
>> >>
>> > This part sounds way less interesting to me (other than the fact you can
>> > already do this with a timestamp in the future, but it'll gc away at
>> gcgs).
>> >
>> >
>> >> I'm thinking there must be something I am forgetting in the
>> >> read/write/compaction paths that invalidate this.
>> >>
>> >
>> > There are a lot of places where we do "smart" things to make sure we
>> don't
>> > accidentally resurrect data. Read path includes old sstables for
>> tombstones
>> > for example. Those all need to be concretely identified and handled (and
>> > tested),.
>>
>
>

Re: row tombstones as a separate sstable citizen

Reply via email to