[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443622#comment-16443622
 ] 

Kurt Greaves edited comment on CASSANDRA-14346 at 4/19/18 6:51 AM:
-------------------------------------------------------------------

I like the idea of scheduled repairs being handled by Cassandra. I think 
sidecar is a better choice purely for isolation from the read/write path, but 
think that we need to fix up the interface to repair first. As Blake mentioned, 
most problems come from the fact that JMX sucks and managing repairs over JMX 
is worse. I think as part of this work (or as a first step) we should be better 
defining this interface, and making it far more robust.

I think we should target the initial work for 4.0 - sprucing up interfaces so 
that repair is easier to work with and making failure handling fool-proof, as 
at least we'll probably be able to reach agreement on that front in a somewhat 
timely fashion. It seems a bit optimistic to target all the scheduling for 4.0 
at this stage, but I suppose it depends how much time people want to dedicate 
to this. 

Also we should keep in mind CASSANDRA-14395 as there's going to be a lot of 
overlap here if we go down the sidecar route.
{quote}[Adaptive|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#bookmark=id.x9qx96jfyivi]
 subrange as proposed in the design and eventually making repair much cheaper
{quote}
If referring to incremental repair, wouldn't this already be the case in 4.0? 
Subrange repair works with incremental repair in trunk at the moment, so we 
should already get some major benefits here. Unless I'm missing something...

In other news, for interests sake (slightly off topic) it seems DS is trying to 
do away with [traditional repair, and instead they've gone the query at CL.ALL 
route|https://docs.datastax.com/en/opscenter/6.5/opsc/online_help/services/opscNodeSyncService.html#opscNodeSyncService__compareRepairServicesOV]
 (or similar) in their new "repair" system. I don't think this is a good idea, 
but good to keep in mind how everyone is approaching the problem.
  


was (Author: kurtg):
I like the idea of scheduled repairs being handled by Cassandra. I think 
sidecar is a better choice purely for isolation from the read/write path, but 
think that we need to fix up the interface to repair first. As Blake mentioned, 
most problems come from the fact that JMX sucks and managing repairs over JMX 
is worse. I think as part of this work (or as a first step) we should be better 
defining this interface, and making it far more robust.

I think we should target the initial work for 4.0 - sprucing up interfaces so 
that repair is easier to work with and making failure handling fool-proof, as 
at least we'll probably be able to reach agreement on that front in a somewhat 
timely fashion. It seems a bit optimistic to target all the scheduling for 4.0 
at this stage, but I suppose it depends how much time people want to dedicate 
to this. 
{quote}[Adaptive|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit#bookmark=id.x9qx96jfyivi]
 subrange as proposed in the design and eventually making repair much cheaper
{quote}
If referring to incremental repair, wouldn't this already be the case in 4.0? 
Subrange repair works with incremental repair in trunk at the moment, so we 
should already get some major benefits here. Unless I'm missing something...

In other news, for interests sake (slightly off topic) it seems DS is trying to 
do away with [traditional repair, and instead they've gone the query at CL.ALL 
route|https://docs.datastax.com/en/opscenter/6.5/opsc/online_help/services/opscNodeSyncService.html#opscNodeSyncService__compareRepairServicesOV]
 (or similar) in their new "repair" system. I don't think this is a good idea, 
but good to keep in mind how everyone is approaching the problem.
  

> Scheduled Repair in Cassandra
> -----------------------------
>
>                 Key: CASSANDRA-14346
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Repair
>            Reporter: Joseph Lynch
>            Priority: Major
>              Labels: CommunityFeedbackRequested
>             Fix For: 4.0
>
>         Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to