Re: row tombstones as a separate sstable citizen

Jeff Jirsa Thu, 15 Feb 2018 09:26:05 -0800

Worth a JIRA, yes


On Wed, Feb 14, 2018 at 9:45 AM, Carl Mueller <carl.muel...@smartthings.com>
wrote:

> So is this at least a decent candidate for a feature request ticket?
>
>
> On Tue, Feb 13, 2018 at 8:09 PM, Carl Mueller <
> carl.muel...@smartthings.com>
> wrote:
>
> > I'm particularly interested in getting the tombstones to "promote" up the
> > levels of LCS more quickly. Currently they get attached at the low level
> > and don't propagate up to higher levels until enough activity at a lower
> > level promotes the data. Meanwhile, LCS means compactions can occur in
> > parallel at each level. So row tombstones in their own sstable could be
> up
> > promoted the LCS levels preferentially before normal processes would move
> > them up.
> >
> > So if the delete-only sstables could move up more quickly, the compaction
> > at the levels would happen more quickly.
> >
> > The threshold stuff is nice if I read 7019 correctly, but what is the %
> > there? % of rows? % of columns? or % of the size of the sstable? Row
> > tombstones are pretty compact being just the rowkey and the tombstone
> > marker. So if 7019 is triggered at 10% of the sstable size, even a
> crapton
> > of tombstones deleting practially the entire database would only be a
> small
> > % size of the sstable.
> >
> > Since the row tombstones are so compact, that's why I think they are good
> > candidates for special handling.
> >
> > On Tue, Feb 13, 2018 at 5:22 PM, J. D. Jordan <jeremiah.jor...@gmail.com
> >
> > wrote:
> >
> >> Have you taken a look at the new stuff introduced by
> >> https://issues.apache.org/jira/browse/CASSANDRA-7019 ?  I think it may
> >> go a ways to reducing the need for something complicated like this.
> >> Though it is an interesting idea as special handling for bulk deletes.
> >> If they were truly just sstables that only contained deletes the logic
> from
> >> 7109 would probably go a long ways. Though if you are bulk inserting
> >> deletes that is what you would end up with, so maybe it already works.
> >>
> >> -Jeremiah
> >>
> >> > On Feb 13, 2018, at 6:04 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> >> >
> >> > On Tue, Feb 13, 2018 at 2:38 PM, Carl Mueller <
> >> carl.muel...@smartthings.com>
> >> > wrote:
> >> >
> >> >> In process of doing my second major data purge from a cassandra
> system.
> >> >>
> >> >> Almost all of my purging is done via row tombstones. While performing
> >> this
> >> >> the second time while trying to cajole compaction to occur (in 2.1.x,
> >> >> LevelledCompaction) to goddamn actually compact the data, I've been
> >> >> thinking as to why there isn't a separate set of sstable
> infrastructure
> >> >> setup for row deletion tombstones.
> >> >>
> >> >> I'm imagining that row tombstones are written to separate sstables
> than
> >> >> mainline data updates/appends and range/column tombstones.
> >> >>
> >> >> By writing them to separate sstables, the compaction systems can
> >> >> preferentially merge / process them when compacting sstables.
> >> >>
> >> >> This would create an additional sstable for lookup in the bloom
> >> filters,
> >> >> granted. I had visions of short circuiting the lookups to other
> >> sstables if
> >> >> a row tombstone was present in one of the special row tombstone
> >> sstables.
> >> >>
> >> >>
> >> > All of the above sounds really interesting to me, but I suspect it's a
> >> LOT
> >> > of work to make it happen correctly.
> >> >
> >> > You'd almost end up with 2 sets of logs for the LSM - a tombstone
> >> > log/generation, and a data log/generation, and the tombstone logs
> would
> >> be
> >> > read-only inputs to data compactions.
> >> >
> >> >
> >> >> But that would only be possible if there was the notion of a "super
> row
> >> >> tombstone" that permanently deleted a rowkey and all future writes
> >> would be
> >> >> invalidated. Kind of like how a tombstone with a mistakenly huge
> >> timestamp
> >> >> becomes a sneaky permanent tombstone, but intended. There could be a
> >> >> special operation / statement to undo this permanent tombstone, and
> >> since
> >> >> the row tombstones would be in their own dedicated sstables, they
> could
> >> >> process and compact more quickly, with prioritization by the
> compactor.
> >> >>
> >> >>
> >> > This part sounds way less interesting to me (other than the fact you
> can
> >> > already do this with a timestamp in the future, but it'll gc away at
> >> gcgs).
> >> >
> >> >
> >> >> I'm thinking there must be something I am forgetting in the
> >> >> read/write/compaction paths that invalidate this.
> >> >>
> >> >
> >> > There are a lot of places where we do "smart" things to make sure we
> >> don't
> >> > accidentally resurrect data. Read path includes old sstables for
> >> tombstones
> >> > for example. Those all need to be concretely identified and handled
> (and
> >> > tested),.
> >>
> >
> >
>

Re: row tombstones as a separate sstable citizen

Reply via email to