Worth a JIRA, yes
On Wed, Feb 14, 2018 at 9:45 AM, Carl Mueller <carl.muel...@smartthings.com> wrote: > So is this at least a decent candidate for a feature request ticket? > > > On Tue, Feb 13, 2018 at 8:09 PM, Carl Mueller < > carl.muel...@smartthings.com> > wrote: > > > I'm particularly interested in getting the tombstones to "promote" up the > > levels of LCS more quickly. Currently they get attached at the low level > > and don't propagate up to higher levels until enough activity at a lower > > level promotes the data. Meanwhile, LCS means compactions can occur in > > parallel at each level. So row tombstones in their own sstable could be > up > > promoted the LCS levels preferentially before normal processes would move > > them up. > > > > So if the delete-only sstables could move up more quickly, the compaction > > at the levels would happen more quickly. > > > > The threshold stuff is nice if I read 7019 correctly, but what is the % > > there? % of rows? % of columns? or % of the size of the sstable? Row > > tombstones are pretty compact being just the rowkey and the tombstone > > marker. So if 7019 is triggered at 10% of the sstable size, even a > crapton > > of tombstones deleting practially the entire database would only be a > small > > % size of the sstable. > > > > Since the row tombstones are so compact, that's why I think they are good > > candidates for special handling. > > > > On Tue, Feb 13, 2018 at 5:22 PM, J. D. Jordan <jeremiah.jor...@gmail.com > > > > wrote: > > > >> Have you taken a look at the new stuff introduced by > >> https://issues.apache.org/jira/browse/CASSANDRA-7019 ? I think it may > >> go a ways to reducing the need for something complicated like this. > >> Though it is an interesting idea as special handling for bulk deletes. > >> If they were truly just sstables that only contained deletes the logic > from > >> 7109 would probably go a long ways. Though if you are bulk inserting > >> deletes that is what you would end up with, so maybe it already works. > >> > >> -Jeremiah > >> > >> > On Feb 13, 2018, at 6:04 PM, Jeff Jirsa <jji...@gmail.com> wrote: > >> > > >> > On Tue, Feb 13, 2018 at 2:38 PM, Carl Mueller < > >> carl.muel...@smartthings.com> > >> > wrote: > >> > > >> >> In process of doing my second major data purge from a cassandra > system. > >> >> > >> >> Almost all of my purging is done via row tombstones. While performing > >> this > >> >> the second time while trying to cajole compaction to occur (in 2.1.x, > >> >> LevelledCompaction) to goddamn actually compact the data, I've been > >> >> thinking as to why there isn't a separate set of sstable > infrastructure > >> >> setup for row deletion tombstones. > >> >> > >> >> I'm imagining that row tombstones are written to separate sstables > than > >> >> mainline data updates/appends and range/column tombstones. > >> >> > >> >> By writing them to separate sstables, the compaction systems can > >> >> preferentially merge / process them when compacting sstables. > >> >> > >> >> This would create an additional sstable for lookup in the bloom > >> filters, > >> >> granted. I had visions of short circuiting the lookups to other > >> sstables if > >> >> a row tombstone was present in one of the special row tombstone > >> sstables. > >> >> > >> >> > >> > All of the above sounds really interesting to me, but I suspect it's a > >> LOT > >> > of work to make it happen correctly. > >> > > >> > You'd almost end up with 2 sets of logs for the LSM - a tombstone > >> > log/generation, and a data log/generation, and the tombstone logs > would > >> be > >> > read-only inputs to data compactions. > >> > > >> > > >> >> But that would only be possible if there was the notion of a "super > row > >> >> tombstone" that permanently deleted a rowkey and all future writes > >> would be > >> >> invalidated. Kind of like how a tombstone with a mistakenly huge > >> timestamp > >> >> becomes a sneaky permanent tombstone, but intended. There could be a > >> >> special operation / statement to undo this permanent tombstone, and > >> since > >> >> the row tombstones would be in their own dedicated sstables, they > could > >> >> process and compact more quickly, with prioritization by the > compactor. > >> >> > >> >> > >> > This part sounds way less interesting to me (other than the fact you > can > >> > already do this with a timestamp in the future, but it'll gc away at > >> gcgs). > >> > > >> > > >> >> I'm thinking there must be something I am forgetting in the > >> >> read/write/compaction paths that invalidate this. > >> >> > >> > > >> > There are a lot of places where we do "smart" things to make sure we > >> don't > >> > accidentally resurrect data. Read path includes old sstables for > >> tombstones > >> > for example. Those all need to be concretely identified and handled > (and > >> > tested),. > >> > > > > >