[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147579#comment-15147579 ]
Marcus Olsson commented on CASSANDRA-10070: ------------------------------------------- {quote} All data centers involved in a repair must be available for a repair to start/succeed, so if we make the lock resource dc-aware and try to create the lock by contacting a node in each involved data center with LOCAL_SERIAL consistency that should be sufficient to ensure correctness without the need for a global lock. This will also play along well with both dc_parallelism global option and with the --local or --dcs table repair options. {quote} {quote} The second alternative is probably the most desireable. Actually dc_parallelism by itself might cause problems, since we can have a situation where all repairs run in a single node or range, overloading those nodes. If we are to support concurrent repairs in the first pass, I think we need both dc_parallelism and node_parallelism options together. {quote} {quote} This is becoming a bit complex and there probably are some edge cases and/or starvation scenarios so we should think carefully about before jumping into implementation. What do you think about this approach? Should we stick to a simpler non-parallel version in the first pass or think this through and already support parallelism in the first version? {quote} I like the approach with using local serial for each dc and having specialized keys. I think we could include the dc parallelism lock with "RepairResource-\{dc}-\{i}" but only allow one repair per data center by hardcoding "i" to 1 in the first pass. This should make the upgrades easier when we do allow parallel repairs. I like the node locks approach as well, but as you say there are probably some edge cases so we could wait with adding them until we allow parallel repairs and I don't think it would break the upgrades by introducing them later. {quote} We should also think better about possible failure scenarios and network partitions. What happens if the node cannot renew locks in a remote DC due to a temporary network partition but the repair is still running ? We should probably cancel a repair if not able to renew the lock and also have some kind of garbage collector to kill ongoing repair sessions without associated locks to protect from disrespecting the configured dc_parallelism and node_paralellism. {quote} I agree and we could probably store the parent repair session id in an extra column of the lock table and have a thread wake up periodically to see if there are repair sessions without locks. But then we must somehow be able to differentiate user-defined and automatically scheduled repair sessions. It could be done by having all repairs go through this scheduling interface, which also would reduce user mistakes with multiple repairs in parallel. Another alternative is to have a custom flag in the parent repair that makes the garbage collector ignore it if it's user-defined. I think that the garbage collector/cancel repairs when unable to lock feature is something that should be included in the first pass. The most basic failure scenarios should be covered by retrying a repair if it fails and log a warning/error based on how many times it failed. Could the retry behaviour cause some unexpected consequences? > Automatic repair scheduling > --------------------------- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement > Reporter: Marcus Olsson > Assignee: Marcus Olsson > Priority: Minor > Fix For: 3.x > > Attachments: Distributed Repair Scheduling.doc > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)