[ 
https://issues.apache.org/jira/browse/KUDU-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280664#comment-15280664
 ] 

Todd Lipcon commented on KUDU-1449:
-----------------------------------

This is interesting -- the master should indeed have noticed the 
under-replicated tablet (2 replicas in the config) and tried to create a new 
replica. Maybe you're hitting some issue where the remote bootstrap is timing 
out or not eventually succeeding? Maybe a bug like KUDU-1408 is hitting you?

Probably the best next step to diagnose would be to grep both the master log 
and the tablet server logs for the tablet ID in question, and see if we can 
piece together a timeline.

> tablet unavailable caused by  follower can not upgrade to leader.
> -----------------------------------------------------------------
>
>                 Key: KUDU-1449
>                 URL: https://issues.apache.org/jira/browse/KUDU-1449
>             Project: Kudu
>          Issue Type: Bug
>         Environment: jd.com production env
>            Reporter: zhangsong
>            Priority: Critical
>
> 1 background : there is 5 node crash due to sys oom today , according to raft 
> protocol, kudu should select follower and upgrade it to leader and provide 
> service again,while it did not.  
> Found such error when issuing query via impala: "Unable to open scanner: 
> Timed out: GetTableLocations(flow_first_buy_user_0504, bucket=453, string 
> memberid=, int32 cate3_id=-2147483648, int32 cate2_id=-2147483648, int32 
> cate1_id=-2147483648, int32 chan_type=-2147483648, int32 
> county_id=-2147483648, int32 city_id=-2147483648, int32 
> province_id=-2147483648, 1) failed: timed out after deadline expired: timed 
> out after deadline expired
> "  
> 2 analysis:
> According to the bucket# , found the target tablet only has two 
> replicas,which is odd. Meantime the tablet-server hosting the leader replica 
> has crashed. 
> The follower can not upgrade to leader in that situation: only one leader and 
> one follower ,leader dead, follower can not get majority of votes for its 
> upgrading to leader(as only itself votes for itself).
> Thus result in the unavailability of tablet while there is a follower left 
> hosting the replica.
> After restart kudu-server on the node which hosting the previous leader 
> replica,  Observed that the leader replica become follower and previous 
> follower replica become leader, another follower replica is created and there 
> is 3-replica raft-configuration again.
> 3 modifications:
> follower should notice the abnormal situation where there is only two replica 
> in raft-configuration: one leader and one follower, and contact master to 
> correct it.
> 4 to do:
> what cause the two-replica raft-configuration is still known.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to