[
https://issues.apache.org/jira/browse/KUDU-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185330#comment-15185330
]
Todd Lipcon commented on KUDU-1338:
-----------------------------------
http://gerrit.cloudera.org:8080/#/c/2483/ has a small test change that seems to
reproduce this issue. After a config change op is aborted, it's still left as
"pending", which prevents future config changes from happening. My guess is
that this also might be able to produce divergent leaders, since we use the
pending config for leader election requests, right?
> Tablet stuck in RaftConfig change currently pending
> ---------------------------------------------------
>
> Key: KUDU-1338
> URL: https://issues.apache.org/jira/browse/KUDU-1338
> Project: Kudu
> Issue Type: Bug
> Components: consensus
> Affects Versions: 0.7.0
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Attachments: KUDU_TSERVER.node-2.internal.gz,
> KUDU_TSERVER.node-3.internal.gz, KUDU_TSERVER.node-5.internal.gz, logs.tgz
>
>
> We've been adapting the consensus logs for a while and I think we can finally
> get to the bottom of this issue. I'm attaching the logs from the 3 nodes that
> participated in the same config for tablet eaa1877a2b3540cf8202aff844c6ca79.
> ITBLL is driving the load and eventually fails at 2016-02-15 14:53:12,005
> trying to write to node-2 AKA a1081edd2ca24f6b9dcdd7e5000f95ec. The peer that
> gets stuck is node-5 AKA cdec7fdacbac4ad1b095275b3bdbbe5c, starting from this
> line:
> {noformat}
> I0215 14:28:41.585695 2020 raft_consensus_state.cc:459] T
> eaa1877a2b3540cf8202aff844c6ca79 P cdec7fdacbac4ad1b095275b3bdbbe5c [term 69
> FOLLOWER]: Illegal state: RaftConfig change currently pending. Only one is
> allowed at a time.
> {noformat}
> The chaos monkey running on this setup is dropping packets one node at time.
> I'll attach the logs in a moment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)