[
https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16173034#comment-16173034
]
Stefan Podkowinski commented on CASSANDRA-13885:
------------------------------------------------
It's always a potential problem before CASSANDRA-9143, yes. But since only
unrepaired data is affected, running incremental repairs often enough before
gc_grace will minimize the chance that a sstable would be skipped from
anti-compaction and remain in the unrepaired set afterwards. And that's what
incremental repairs are designed for anyways, to be run regularly on new data.
The important thing is that at the end, all data needs to be successfully
promoted to the repaired set before gc_grace. Why is that important? Because
after gc_grace, deleted data may be compacted away on replicas. But this will
not happen in case the tombstone and corresponding data will be in different
repaired/unrepaired sets, as those will not be compacted together. Also
remember that incremental will only validate sstables in unrepaired. As a
consequence, after the next incremental repair, the data from the unrepaired
set (but not the tombstone from repaired set) will be transferred to the other
replicas, where the data already had been compacted away before.
So how would this situation change if we'd not run anti-compaction (promote to
repaired) after full repairs at all? In this case we'd just let the unrepaired
set grow, which should not be a problem on its own. But the operator would be
responsible to schedule incremental repairs often enough to make sure the
promotion process is happening before gc_grace, to avoid the potential data
inconsistency issues describe above. The only other way to avoid these would be
not to run incremental repairs at all anymore, which would be fine, too. So
yes, I guess we could agree in this ticket under which situations it would be
acceptable to run full repairs with a --skip-anticompaction flag, but I'd also
like to hear how to communicate the correct scheduling to users, without just
handing them a loaded gun. Because currently you can't do wrong by mixing full
and incremental (as far as I can tell) and we can get away by telling people to
run any kind of repair at least once before gc_grace, e.g. weekly incremental
with every n-th as a full repair.
Exclusively running full repairs, even with included anti-compaction at the
end, is btw not as broken as you may thing. In that situation you simply don't
care about the unrepaired set. The anti-compaction at the end of the repair is
a waste, yes, but it's not so bad (performance wise), as we only have to
anti-compact new unrepaired data since the last repair. Not being able to
perform parallel -pr repairs is an unfortunate side-effect of this, but I'd
still prefer to recommend avoid using -pr in parallel and fall back to range
based repairs if the cluster size doesn't allow this. Doing subrange repairs
would actually cause the same problems as -pr, but with CASSANDRA-10422 it was
decided to skip them, so all the caveats described above will apply there,
although I'd not expect users doing subrange repairs mixed with incremental
repairs.
> Allow to run full repairs in 3.0 without additional cost of anti-compaction
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-13885
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13885
> Project: Cassandra
> Issue Type: Bug
> Reporter: Thomas Steinmaurer
>
> This ticket is basically the result of the discussion in Cassandra user list:
> https://www.mail-archive.com/[email protected]/msg53562.html
> I was asked to open a ticket by Paulo Motta to think about back-porting
> running full repairs without the additional cost of anti-compaction.
> Basically there is no way in 3.0 to run full repairs from several nodes
> concurrently without troubles caused by (overlapping?) anti-compactions.
> Coming from 2.1 this is a major change from an operational POV, basically
> breaking any e.g. cron job based solution kicking off -pr based repairs on
> several nodes concurrently.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]