[
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424533#comment-16424533
]
Blake Eggleston commented on CASSANDRA-14346:
---------------------------------------------
I agree the status quo is not the way to go. The two options I'm thinking of
are:
1. Cassandra schedules repairs in process
2. Cassandra ships with a tool that schedules repairs out of process.
While I agree that yet another external tool wouldn't be great for Cassandra, a
tool that's part of the Cassandra offering is different, and I think that would
be good for Cassandra. So I'm 100% in favor of scheduling repairs _somehow_ as
part of Cassandra. To me, the question is whether it's better to do that in or
out of process, and how that works. Sorry if I didn't make that clear.
The main point of my disagreement wasn't about ipc, it was this:
{quote}
The point about being in process making it easier to detect and react to
failures is where I’m really not convinced. There might some straightforward
failures that you’d be able to pick up on, but the real problem you need to
solve is a distributed one. Specifically, you need a way to recover when the
repair coordinator misses the success or failure message from a remote sync
task. If you haven’t solved that, then you’ve only solved part of the problem
and are just guessing. That’s something you can’t solve in process, and is
going to require some internode communication. Also, solving that problem would
probably provide the infrastructure you need to detect and resolve failures
that aren’t as difficult to detect.
{quote}
Regarding IPC, the choice between the current jmx setup and inter-thread
communication is sort of a false dilemma. I mean, we're dealing with a
distributed database here. IPC is one of our core competencies. The ipc
problems with repair today are more about poorly designed interfaces than an
inherent limitation in ipc.
> Scheduled Repair in Cassandra
> -----------------------------
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
> Issue Type: Improvement
> Components: Repair
> Reporter: Joseph Lynch
> Priority: Major
> Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes
> sense given that it is necessary to give our users eventual consistency. Most
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar),
> which we spoke about last year at NGCC. Given the positive feedback at NGCC
> we focussed on getting it production ready and have now been using it in
> production to repair hundreds of clusters, tens of thousands of nodes, and
> petabytes of data for the past six months. Also based on feedback at NGCC we
> have invested effort in figuring out how to integrate this natively into
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our
> implementation into Cassandra, and have created a [design
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
> showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would
> be greatly appreciated about the interface or v1 implementation features. I
> have tried to call out in the document features which we explicitly consider
> future work (as well as a path forward to implement them in the future)
> because I would very much like to get this done before the 4.0 merge window
> closes, and to do that I think aggressively pruning scope is going to be a
> necessity.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]