[
https://issues.apache.org/jira/browse/KUDU-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432525#comment-15432525
]
zhangsong edited comment on KUDU-1576 at 8/23/16 9:52 AM:
----------------------------------------------------------
the time line is like:
at first , replica 2 is the leader
at: I0819 11:08:39.935573 replica 2 is down(according to last line of
kudu-tserver.INFO),trigger another two follower to requestVote.
at I0819 11:09:02.246912 replica 1 won elect.
at I0819 11:09:06.675046 20157 replica 3 is down(according to last line of
kudu-tserver.INFO)
at W0819 11:09:11.407464 see consensus timeout message related to replica 3
(according to kudu-tserver.INFO on replica 1)
was (Author: brucesz):
last line of kudu-tserver.INFO on replica 2:
I0819 11:08:39.935573 12831 multi_column_writer.cc:85]...
that on replica 3:
I0819 11:09:06.675046 20157 raft_consensus.cc:380] ..
the time line is like:
at first , replica 2 is the leader
at: I0819 11:08:39.935573 replica 2 is down(according to last line of
kudu-tserver.INFO),trigger another two follower to requestVote.
at I0819 11:09:02.246912 replica 1 won elect.
at I0819 11:09:06.675046 20157 replica 3 is down(according to last line of
kudu-tserver.INFO)
at W0819 11:09:11.407464 see consensus timeout message related to replica 3
(according to kudu-tserver.INFO on replica 1)
> raft-config will stay in pending state a long time in node crash situation.
> ---------------------------------------------------------------------------
>
> Key: KUDU-1576
> URL: https://issues.apache.org/jira/browse/KUDU-1576
> Project: Kudu
> Issue Type: Bug
> Reporter: zhangsong
>
> After experiencing two phsical nodes crash, i found one of my table is
> read-only. i did some search and found that both of two followers of a
> tablet is in down state. But from web-ui those down follower are still
> there. So i try to recovery the table with kudu-admin tool's change_config
> and it failed with below message:
> Pending config: local: false peers { permanent_uuid:
> "515ab1adcbd64081b646a86133f5f60d" member_type: VOTER last_known_addr { host:
> "one_of_follower" port: 7052 } } peers { permanent_uuid:
> "3a77ef5039f447d29db5a44c92279a7a" member_type: VOTER last_known_addr { host:
> "current_leader" port: 7052 } }
> it seems that after one of raft-config members is down, when current leader
> is trying to replicate the config, the "515ab1adcbd64081b646a86133f5f60d"
> crashed . In which case , the config just pend there, as the raft-config will
> never get accepted by majority.
> It will be better that we can have some machanism to fix it , at least
> manually.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)