[
https://issues.apache.org/jira/browse/KUDU-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated KUDU-1194:
-------------------------------------
Priority: Major (was: Critical)
> consensus: Allow abort of uncommittable config change ops
> ---------------------------------------------------------
>
> Key: KUDU-1194
> URL: https://issues.apache.org/jira/browse/KUDU-1194
> Project: Kudu
> Issue Type: Improvement
> Components: consensus
> Reporter: Mike Percy
> Assignee: Mike Percy
>
> Wanted to capture a few thoughts about manually fixing broken configs or
> automatically rolling back bad config changes. This isn't a fully baked
> design, just wanted to jot down some initial thoughts.
> A general way to (attempt to) abort uncommitted ops is to truncate the Raft
> log on the leader (and replace the op with a NO_OP or something similar).
> Some thoughts on recovering from "bad" configs:
> * We may hit a situation where there is an in-progress config change
> operation that will be impossible to commit due to a majority of the nodes in
> the "target" config being permanently dead. If the leader is still alive, we
> can provide a timeout on these ops or a way to explicitly (via RPC) abort
> them by truncating the log.
> * If no leader is alive, and it's impossible to elect one, then we could
> write an "unsafe" tool only for emergency use that could do something evil
> like make the follower think that the tool is the new leader and append an
> unsafe change-config op to the follower's log.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)