[ 
https://issues.apache.org/jira/browse/KUDU-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604673#comment-15604673
 ] 

Gordon Gardiner edited comment on KUDU-1483 at 10/25/16 8:42 AM:
-----------------------------------------------------------------

We saw something like this recently on kudu 1.0.0 where for a particular tablet 
there were two followers and the election of a leader keeps failing.  Every 
other tablet for that table and others has a single leader and two followers.  
This just keeps spamming the logs with the same kind of messages posted above.  
It means the table is effectively read only.  Fortunately this is only a QA 
cluster and was just a test table with a modest amount of data.

I believe the table was created under kudu 0.8 or 0.7.


was (Author: gordon.gardiner1):
We saw something like this recently where for a particular tablet there were 
two followers and the election of a leader keeps failing.  Every other tablet 
for that table and others has a single leader and two followers.  This just 
keeps spamming the logs with the same kind of messages posted above.  It means 
the table is effectively read only.  Fortunately this is only a QA cluster and 
was just a test table with a modest amount of data.

I believe the table was created under kudu 0.8 or 0.7.

> in some cases, followers cannot promote to leader.
> --------------------------------------------------
>
>                 Key: KUDU-1483
>                 URL: https://issues.apache.org/jira/browse/KUDU-1483
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: zhangsong
>
> in my env, a tablet only has two follower on master's webui, that situation 
> last forever.
> Some logs about the tablet on two followers log:
> follower1:
>  I0613 11:16:33.244365 26846 leader_election.cc:223] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e 
> [CANDIDATE]: Term 31717 election: Requesting vote from
>  peer 8cf59ddd6d154ae99d3b23da840169e0W0613 11:16:33.247150 26016 
> leader_election.cc:281] T 87588b06c65d4898a5b8c29d08b3528d P 
> eded59517b14432ab9022cd50d160b8e [CANDIDATE]: Term 31717 election: Tablet 
> error from VoteRequest() call to peer 8cf59ddd6d154ae99d3b23da840169e0: 
> Illegal state: Tablet not RUN
> NING: FAILED: Not found: Can't find block: 1363326557009763249I0613 
> 11:16:33.247463 26016 leader_election.cc:248] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e 
> [CANDIDATE]: Term 31717 election: Election decided. Re
> sult: candidate lost.I0613 11:16:33.248205 17534 raft_consensus.cc:1942] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Snoozing failure detection for election timeout plus an 
> additional 15.536s
> I0613 11:16:33.248245 17534 raft_consensus.cc:1795] T 
> 87588b06c65d4898a5b8c29d08b3528d P
>  eded59517b14432ab9022cd50d160b8e [term 31717 FOLLOWER]: Leader election lost 
> for term 3
> 1717. Reason: None given
> sult: candidate lost.I0613 11:16:33.248205 17534 raft_consensus.cc:1942] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Snoozing failure detection for election timeout plus an 
> additional 15.536sI0613 11:16:33.248245 17534 raft_consensus.cc:1795] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election lost for term 31717. Reason: None given
> I0613 11:16:34.288436 26137 raft_consensus.cc:1298] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Handling vote request from an unknown peer 
> 95bc8f3637ed4a52b53a984052ba6114
> I0613 11:16:34.288633 26137 raft_consensus.cc:1558] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election vote request: Denying vote to candidate 
> 95bc8f3637ed4a52b53a984052ba6114 for earlier term 31666. Current term is 
> 31717.
> I0613 11:16:41.506261 26127 raft_consensus.cc:1298] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Handling vote request from an unknown peer 
> 95bc8f3637ed4a52b53a984052ba6114
> I0613 11:16:41.506325 26127 raft_consensus.cc:1558] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election vote request: Denying vote to candidate 
> 95bc8f3637ed4a52b53a984052ba6114 for earlier term 31667. Current term is 
> 31717.
> I0613 11:16:45.440551 26135 raft_consensus.cc:1298] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Handling vote request from an unknown peer 
> 95bc8f3637ed4a52b53a984052ba6114
> I0613 11:16:45.440625 26135 raft_consensus.cc:1558] T 
> 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election vote request: Denying vote to candidate 
> 95bc8f3637ed4a52b53a984052ba6114 for earlier term 31668. Current term is 
> 31717.
> it seems that there are three follower/voters  and one of it has tablet in 
> "not running" state.
> on the other follower:
> W0613 11:16:45.437863 18782 leader_election.cc:281] T 
> 87588b06c65d4898a5b8c29d08b3528d P 95bc8f3637ed4a52b53a984052ba6114 
> [CANDIDATE]: Term 31668 election: Tablet error from VoteRequest() call to 
> peer 8cf59ddd6d154ae99d3b23da840169e0: Illegal state: Tablet not RUNNING: 
> FAILED: Not found: Can't find block: 1363326557009763249
> W0613 11:16:45.438611 18782 leader_election.cc:333] T 
> 87588b06c65d4898a5b8c29d08b3528d P 95bc8f3637ed4a52b53a984052ba6114 
> [CANDIDATE]: Term 31668 election: Vote denied by peer 
> eded59517b14432ab9022cd50d160b8e with higher term. Message: Invalid argument: 
> T 87588b06c65d4898a5b8c29d08b3528d P eded59517b14432ab9022cd50d160b8e [term 
> 31717 FOLLOWER]: Leader election vote request: Denying vote to candidate 
> 95bc8f3637ed4a52b53a984052ba6114 for earlier term 31668. Current term is 
> 31717.
> I0613 11:16:45.439034 18782 leader_election.cc:336] T 
> 87588b06c65d4898a5b8c29d08b3528d P 95bc8f3637ed4a52b53a984052ba6114 
> [CANDIDATE]: Term 31668 election: Cancelling election due to peer responding 
> with higher term
> I0613 11:16:45.440032 21807 raft_consensus.cc:1942] T 
> 87588b06c65d4898a5b8c29d08b3528d P 95bc8f3637ed4a52b53a984052ba6114 [term 
> 31668 FOLLOWER]: Snoozing failure detection for election timeout plus an 
> additional 15.493s
> And this logs repeat again and again, it seems that follower with low term 
> start leader election and get denied by followers with high term, and the 
> follower with high term doesn't kown about the first follower for some reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to