[ 
https://issues.apache.org/jira/browse/KUDU-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-1194:
-------------------------------------
    Priority: Major  (was: Critical)

> consensus: Allow abort of uncommittable config change ops
> ---------------------------------------------------------
>
>                 Key: KUDU-1194
>                 URL: https://issues.apache.org/jira/browse/KUDU-1194
>             Project: Kudu
>          Issue Type: Improvement
>          Components: consensus
>            Reporter: Mike Percy
>            Assignee: Mike Percy
>
> Wanted to capture a few thoughts about manually fixing broken configs or 
> automatically rolling back bad config changes. This isn't a fully baked 
> design, just wanted to jot down some initial thoughts.
> A general way to (attempt to) abort uncommitted ops is to truncate the Raft 
> log on the leader (and replace the op with a NO_OP or something similar).
> Some thoughts on recovering from "bad" configs:
> * We may hit a situation where there is an in-progress config change 
> operation that will be impossible to commit due to a majority of the nodes in 
> the "target" config being permanently dead. If the leader is still alive, we 
> can provide a timeout on these ops or a way to explicitly (via RPC) abort 
> them by truncating the log.
> * If no leader is alive, and it's impossible to elect one, then we could 
> write an "unsafe" tool only for emergency use that could do something evil 
> like make the follower think that the tool is the new leader and append an 
> unsafe change-config op to the follower's log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to