This document does a really good job of listing out some of the issues of coordinating scheduling repair. Regardless of which camp you fall into, it is certainly worth a read.
On Wed, Apr 4, 2018 at 8:10 AM, Joseph Lynch <joe.e.ly...@gmail.com> wrote: > I just want to say I think it would be great for our users if we moved > repair scheduling into Cassandra itself. The team here at Netflix has > opened the ticket <https://issues.apache.org/jira/browse/CASSANDRA-14346> > and have written a detailed design document > <https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#heading=h.iasguic42ger> > that includes problem discussion and prior art if anyone wants to > contribute to that. We tried to fairly discuss existing solutions, what > their drawbacks are, and a proposed solution. > > If we were to put this as part of the main Cassandra daemon, I think it > should probably be marked experimental and of course be something that > users opt into (table by table or cluster by cluster) with the > understanding that it might not fully work out of the box the first time we > ship it. We have to be willing to take risks but we also have to be honest > with our users. It may help build confidence if a few major deployments use > it (such as Netflix) and we are happy of course to provide that QA as best > we can. > > -Joey > > On Tue, Apr 3, 2018 at 10:48 AM, Blake Eggleston <beggles...@apple.com> > wrote: > >> Hi dev@, >> >> >> >> The question of the best way to schedule repairs came up on >> CASSANDRA-14346, and I thought it would be good to bring up the idea of an >> external tool on the dev list. >> >> >> >> Cassandra lacks any sort of tools for automating routine tasks that are >> required for running clusters, specifically repair. Regular repair is a >> must for most clusters, like compaction. This means that, especially as far >> as eventual consistency is concerned, Cassandra isn’t totally functional >> out of the box. Operators either need to find a 3rd party solution or >> implement one themselves. Adding this to Cassandra would make it easier to >> use. >> >> >> >> Is this something we should be doing? If so, what should it look like? >> >> >> >> Personally, I feel like this is a pretty big gap in the project and would >> like to see an out of process tool offered. Ideally, Cassandra would just >> take care of itself, but writing a distributed repair scheduler that you >> trust to run in production is a lot harder than writing a single process >> management application that can failover. >> >> >> >> Any thoughts on this? >> >> >> >> Thanks, >> >> >> >> Blake >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org