[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420838#comment-16420838
 ] 

Joseph Lynch commented on CASSANDRA-14346:
------------------------------------------

[~adejanovski] thanks for the feedback, it is appreciated to have more eyes 
looking. I think my general response to you is "yes this is a very hard 
problem, and this solution does not entirely solve it, but let's take it 
incrementally". I'm hoping to get a basic version into 4.0, *marked explicitly 
as experimental* and disabled by default. Users can then start using it if they 
like, and we can iterate, fixing bugs, adding features and iterating as we go. 
We will absolutely have global kill switches to turn it off. I'll reply to your 
specific points in the doc and if I get a chance I'll copy it back to here.

[~bdeggleston]
{quote}
The problem I see with distributing control of cluster level operations like 
repair to the nodes themselves is that it’s more complicated to do correctly 
than it is with a separate management process. You have to deal with failure 
scenarios, internode coordinations, etc, etc. It seems like one of the benefits 
of having a sidecar project like reaper or priam is that you can dispense with 
a lot of the complexity that comes with designing around single points of 
failure, and simplify your management logic.
{quote}
I think some important context is that we just finished implementing this as a 
per node sidecar (in Priam). Having done it with a sidecar I really think 
external processes of any kind are the wrong way to do it generally speaking. 
The short version is "JMX is really bad". In particular, reasoning about stuck 
vs lost repairs (esp when jmx connections temporarily fail and then you lose 
notifications on all the repairs you are doing) is extremely difficult. We 
probably have 2k loc just dealing with edge cases when the sidecar restarts but 
Cassandra does not (you have to wait for Cassandra to finish any existing 
repairs and guess which ones those were), when Cassandra restarts but the 
sidecar does not (you have to wait for Cassandra to come back healthy and 
possibly time it out), when a Cassandra repair thread gets stuck forever and 
never makes any progress, etc ... Fundamentally a sidecar can't reach in and be 
like "hey you should be heartbeating constantly and if you stop making progress 
I will kill you". You also have to manage configuration of repair through an 
additional table rather than table configs, and you have to credential the 
sidecar so it can speak to both JMX and CQL. The main benefit of a sidecar imo 
is that it can use a different Cassandra cluster to coordinate all cluster 
repairs, but I think if we did it right we might be able to have the 
in-cassandra implementation do this as well. 

{quote}
Maybe a better solution here is to provide an official sidecar ops tool for 
cassandra? It’s not trivial suggestion, I know, but every cluster needs one. I 
also feel like there's some momentum building around the idea in the developer 
community. I think it would be worth it to talk about that, before going too 
far with this.
{quote}
I liked this idea when Sankalp and Dinesh proposed it last week to us, and I 
still like it a lot. I'll keep this in mind during the port so that if we do 
end up with a sidecar by 4.0 we can easily switch to it if we decide that's 
better. I personally only think it would be better if we moved all the internal 
repair state out of Cassandra into the sidecar (similar to if we took all the 
internal compaction state out into the sidecar).
 

> Scheduled Repair in Cassandra
> -----------------------------
>
>                 Key: CASSANDRA-14346
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Repair
>            Reporter: Joseph Lynch
>            Priority: Major
>              Labels: CommunityFeedbackRequested
>             Fix For: 4.0
>
>         Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to