[ 
https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173034#comment-16173034
 ] 

Stefan Podkowinski commented on CASSANDRA-13885:
------------------------------------------------

It's always a potential problem before CASSANDRA-9143, yes. But since only 
unrepaired data is affected, running incremental repairs often enough before 
gc_grace will minimize the chance that a sstable would be skipped from 
anti-compaction and remain in the unrepaired set afterwards. And that's what 
incremental repairs are designed for anyways, to be run regularly on new data. 
The important thing is that at the end, all data needs to be successfully 
promoted to the repaired set before gc_grace. Why is that important? Because 
after gc_grace, deleted data may be compacted away on replicas. But this will 
not happen in case the tombstone and corresponding data will be in different 
repaired/unrepaired sets, as those will not be compacted together. Also 
remember that incremental will only validate sstables in unrepaired. As a 
consequence, after the next incremental repair, the data from the unrepaired 
set (but not the tombstone from repaired set) will be transferred to the other 
replicas, where the data already had been compacted away before. 

So how would this situation change if we'd not run anti-compaction (promote to 
repaired) after full repairs at all? In this case we'd just let the unrepaired 
set grow, which should not be a problem on its own. But the operator would be 
responsible to schedule incremental repairs often enough to make sure the 
promotion process is happening before gc_grace, to avoid the potential data 
inconsistency issues describe above. The only other way to avoid these would be 
not to run incremental repairs at all anymore, which would be fine, too. So 
yes, I guess we could agree in this ticket under which situations it would be 
acceptable to run full repairs with a --skip-anticompaction flag, but I'd also 
like to hear how to communicate the correct scheduling to users, without just 
handing them a loaded gun. Because currently you can't do wrong by mixing full 
and incremental (as far as I can tell) and we can get away by telling people to 
run any kind of repair at least once before gc_grace, e.g. weekly incremental 
with every n-th as a full repair.

Exclusively running full repairs, even with included anti-compaction at the 
end, is btw not as broken as you may thing. In that situation you simply don't 
care about the unrepaired set. The anti-compaction at the end of the repair is 
a waste, yes, but it's not so bad (performance wise), as we only have to 
anti-compact new unrepaired data since the last repair. Not being able to 
perform parallel -pr repairs is an unfortunate side-effect of this, but I'd 
still prefer to recommend avoid using -pr in parallel and fall back to range 
based repairs if the cluster size doesn't allow this. Doing subrange repairs 
would actually cause the same problems as -pr, but with CASSANDRA-10422 it was 
decided to skip them, so all the caveats described above will apply there, 
although I'd not expect users doing subrange repairs mixed with incremental 
repairs.

> Allow to run full repairs in 3.0 without additional cost of anti-compaction
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13885
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Thomas Steinmaurer
>
> This ticket is basically the result of the discussion in Cassandra user list: 
> https://www.mail-archive.com/[email protected]/msg53562.html
> I was asked to open a ticket by Paulo Motta to think about back-porting 
> running full repairs without the additional cost of anti-compaction.
> Basically there is no way in 3.0 to run full repairs from several nodes 
> concurrently without troubles caused by (overlapping?) anti-compactions. 
> Coming from 2.1 this is a major change from an operational POV, basically 
> breaking any e.g. cron job based solution kicking off -pr based repairs on 
> several nodes concurrently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to