[jira] [Commented] (KUDU-1194) consensus: Allow abort of uncommittable config change ops

Todd Lipcon (JIRA) Sun, 15 May 2016 16:43:06 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284015#comment-15284015
 ]


Todd Lipcon commented on KUDU-1194:
-----------------------------------

[~mpercy] - any thoughts on this one? I couldn't quite tell from reading the 
description how it is that a server could get into this state. It seems like 
it's at least an uncommon scenario, so maybe Critical priority is too high? 
What do you think?

> consensus: Allow abort of uncommittable config change ops
> ---------------------------------------------------------
>
>                 Key: KUDU-1194
>                 URL: https://issues.apache.org/jira/browse/KUDU-1194
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Mike Percy
>            Assignee: Mike Percy
>            Priority: Critical
>
> Wanted to capture a few thoughts about manually fixing broken configs or 
> automatically rolling back bad config changes. This isn't a fully baked 
> design, just wanted to jot down some initial thoughts.
> A general way to (attempt to) abort uncommitted ops is to truncate the Raft 
> log on the leader (and replace the op with a NO_OP or something similar).
> Some thoughts on recovering from "bad" configs:
> * We may hit a situation where there is an in-progress config change 
> operation that will be impossible to commit due to a majority of the nodes in 
> the "target" config being permanently dead. If the leader is still alive, we 
> can provide a timeout on these ops or a way to explicitly (via RPC) abort 
> them by truncating the log.
> * If no leader is alive, and it's impossible to elect one, then we could 
> write an "unsafe" tool only for emergency use that could do something evil 
> like make the follower think that the tool is the new leader and append an 
> unsafe change-config op to the follower's log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1194) consensus: Allow abort of uncommittable config change ops

Reply via email to