[
https://issues.apache.org/jira/browse/CASSANDRA-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705974#comment-13705974
]
Sylvain Lebresne commented on CASSANDRA-5745:
---------------------------------------------
bq. if you want to compact "everything that overlaps with one sstable" from L1
In practice, you don't necessarily want to go all the way up. If the 2 sstables
that potentially block each other from getting tombstone removed are in L1 and
L2, that's all that you compact. And for 2 sstables A and B to meet that
criteria, they must 1) overlapp and 2) A must contain older data than B and
vice versa. In particular, and discarding the case where people do crazy shit
with their column timestamp, this means that A and B can only meet that
criteria if they are sstables that follow each other in order of flush (of the
result of the compaction of those). And because sstable can't magically jump
level randomly, this also mean those sstable will be pretty much always in
subsequent level in practice.
This is btw why I don't think the original problem described above is not
really a problem in practice. Because if 2 sstables are in the "deadlock"
criteria, they will be close in levels and will in fact get compacted
relatively quickly in practice, so I'm not sure you can get them to deadlock
for long enough that it's a problem in practice.
Besides, I'm all for making sure we don't trigger that new heuristic too often:
we could for instance only do it for sstable that has not been compacted
recently, saying with a day (we do something similar already for tombstone
compaction), so that we don't end up triggering that heuristic too eagerly.
Besides, it would only be triggered if you have nothing better to do in the
first place.
bq. it only does massive amounts of compaction when you explicitly ask for it
But imo, this is dodging the problem. How do you know that you need to trigger
the "big hammer"? Either we say "if you suspect that you have a problem", then
normal user will almost never know when they should do it. If we say "trigger
it regularly like for active repair just in case", then I'm -1 on the idea
because I'm pretty sure that 99.9% of the case people will just inflict massive
I/O on themselves for no reason.
Again, I'm not totally opposed to adding major compaction for LCS (I'm just not
excessively enthousiastic about it), we have it for size tiered after all where
it sucks even more. But as far as solving the problem mentionned in the
description is concerned, I'm not convinced at all that it's the right
solution. In fact, for that, my preference would go in improving the metrics we
expose on our sstables (could be things like how often tombstone that are
gcable do survive a compaction over time) and wait to make sure we actually
have a practical problem, not just a theoretical one.
> Minor compaction tombstone-removal deadlock
> -------------------------------------------
>
> Key: CASSANDRA-5745
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5745
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jonathan Ellis
> Fix For: 2.0.1
>
>
> From a discussion with Axel Liljencrantz,
> If you have two SSTables that have temporally overlapping data, you can get
> lodged into a state where a compaction of SSTable A can't drop tombstones
> because SSTable B contains older data *and vice versa*. Once that's happened,
> Cassandra should be wedged into a state where CASSANDRA-4671 no longer helps
> with tombstone removal. The only way to break the wedge would be to perform a
> compaction containing both SSTable A and SSTable B.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira