[ 
https://issues.apache.org/jira/browse/CASSANDRA-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705974#comment-13705974
 ] 

Sylvain Lebresne commented on CASSANDRA-5745:
---------------------------------------------

bq.  if you want to compact "everything that overlaps with one sstable" from L1

In practice, you don't necessarily want to go all the way up. If the 2 sstables 
that potentially block each other from getting tombstone removed are in L1 and 
L2, that's all that you compact. And for 2 sstables A and B to meet that 
criteria, they must 1) overlapp and 2) A must contain older data than B and 
vice versa. In particular, and discarding the case where people do crazy shit 
with their column timestamp, this means that A and B can only meet that 
criteria if they are sstables that follow each other in order of flush (of the 
result of the compaction of those). And because sstable can't magically jump 
level randomly, this also mean those sstable will be pretty much always in 
subsequent level in practice.

This is btw why I don't think the original problem described above is not 
really a problem in practice. Because if 2 sstables are in the "deadlock" 
criteria, they will be close in levels and will in fact get compacted 
relatively quickly in practice, so I'm not sure you can get them to deadlock 
for long enough that it's a problem in practice.

Besides, I'm all for making sure we don't trigger that new heuristic too often: 
we could for instance only do it for sstable that has not been compacted 
recently, saying with a day (we do something similar already for tombstone 
compaction), so that we don't end up triggering that heuristic too eagerly.  
Besides, it would only be triggered if you have nothing better to do in the 
first place.

bq. it only does massive amounts of compaction when you explicitly ask for it

But imo, this is dodging the problem. How do you know that you need to trigger 
the "big hammer"? Either we say "if you suspect that you have a problem", then 
normal user will almost never know when they should do it. If we say "trigger 
it regularly like for active repair just in case", then I'm -1 on the idea 
because I'm pretty sure that 99.9% of the case people will just inflict massive 
I/O on themselves for no reason.

Again, I'm not totally opposed to adding major compaction for LCS (I'm just not 
excessively enthousiastic about it), we have it for size tiered after all where 
it sucks even more. But as far as solving the problem mentionned in the 
description is concerned, I'm not convinced at all that it's the right 
solution. In fact, for that, my preference would go in improving the metrics we 
expose on our sstables (could be things like how often tombstone that are 
gcable do survive a compaction over time) and wait to make sure we actually 
have a practical problem, not just a theoretical one.
                
> Minor compaction tombstone-removal deadlock
> -------------------------------------------
>
>                 Key: CASSANDRA-5745
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5745
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0.1
>
>
> From a discussion with Axel Liljencrantz,
> If you have two SSTables that have temporally overlapping data, you can get 
> lodged into a state where a compaction of SSTable A can't drop tombstones 
> because SSTable B contains older data *and vice versa*. Once that's happened, 
> Cassandra should be wedged into a state where CASSANDRA-4671 no longer helps 
> with tombstone removal. The only way to break the wedge would be to perform a 
> compaction containing both SSTable A and SSTable B. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to