[
https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147579#comment-15147579
]
Marcus Olsson commented on CASSANDRA-10070:
-------------------------------------------
{quote}
All data centers involved in a repair must be available for a repair to
start/succeed, so if we make the lock resource dc-aware and try to create the
lock by contacting a node in each involved data center with LOCAL_SERIAL
consistency that should be sufficient to ensure correctness without the need
for a global lock. This will also play along well with both dc_parallelism
global option and with the --local or --dcs table repair options.
{quote}
{quote}
The second alternative is probably the most desireable. Actually dc_parallelism
by itself might cause problems, since we can have a situation where all repairs
run in a single node or range, overloading those nodes. If we are to support
concurrent repairs in the first pass, I think we need both dc_parallelism and
node_parallelism options together.
{quote}
{quote}
This is becoming a bit complex and there probably are some edge cases and/or
starvation scenarios so we should think carefully about before jumping into
implementation. What do you think about this approach? Should we stick to a
simpler non-parallel version in the first pass or think this through and
already support parallelism in the first version?
{quote}
I like the approach with using local serial for each dc and having specialized
keys. I think we could include the dc parallelism lock with
"RepairResource-\{dc}-\{i}" but only allow one repair per data center by
hardcoding "i" to 1 in the first pass. This should make the upgrades easier
when we do allow parallel repairs. I like the node locks approach as well, but
as you say there are probably some edge cases so we could wait with adding them
until we allow parallel repairs and I don't think it would break the upgrades
by introducing them later.
{quote}
We should also think better about possible failure scenarios and network
partitions. What happens if the node cannot renew locks in a remote DC due to a
temporary network partition but the repair is still running ? We should
probably cancel a repair if not able to renew the lock and also have some kind
of garbage collector to kill ongoing repair sessions without associated locks
to protect from disrespecting the configured dc_parallelism and
node_paralellism.
{quote}
I agree and we could probably store the parent repair session id in an extra
column of the lock table and have a thread wake up periodically to see if there
are repair sessions without locks. But then we must somehow be able to
differentiate user-defined and automatically scheduled repair sessions. It
could be done by having all repairs go through this scheduling interface, which
also would reduce user mistakes with multiple repairs in parallel. Another
alternative is to have a custom flag in the parent repair that makes the
garbage collector ignore it if it's user-defined. I think that the garbage
collector/cancel repairs when unable to lock feature is something that should
be included in the first pass.
The most basic failure scenarios should be covered by retrying a repair if it
fails and log a warning/error based on how many times it failed. Could the
retry behaviour cause some unexpected consequences?
> Automatic repair scheduling
> ---------------------------
>
> Key: CASSANDRA-10070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10070
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Marcus Olsson
> Assignee: Marcus Olsson
> Priority: Minor
> Fix For: 3.x
>
> Attachments: Distributed Repair Scheduling.doc
>
>
> Scheduling and running repairs in a Cassandra cluster is most often a
> required task, but this can both be hard for new users and it also requires a
> bit of manual configuration. There are good tools out there that can be used
> to simplify things, but wouldn't this be a good feature to have inside of
> Cassandra? To automatically schedule and run repairs, so that when you start
> up your cluster it basically maintains itself in terms of normal
> anti-entropy, with the possibility for manual configuration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)