[ 
https://issues.apache.org/jira/browse/KUDU-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225803#comment-15225803
 ] 

Binglin Chang commented on KUDU-1391:
-------------------------------------

leader log:
{noformat}
I0402 01:43:19.914851 13954 catalog_manager.cc:1560] Tablet: 
6a32cfa0353e4175809c2aa67e16ac9e reported consensus state change. New consensus 
state: current_term: 1 leader_uuid: "edc906d1a1ff4df99ec4a2b2c9985992" config { 
opid_index: -1 local: false peers { permanent_uuid: 
"edc906d1a1ff4df99ec4a2b2c9985992" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st212.bj" port: 18700 } } peers { permanent_uuid: 
"a8f15f0482e44a808d681f0aaecdf713" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st216.bj" port: 18700 } } peers { permanent_uuid: 
"bfdd1dd0797b4a66b25e6ff2758ceeec" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st198.bj" port: 18700 } } }
I0403 21:12:20.004005 13949 catalog_manager.cc:1560] Tablet: 
6a32cfa0353e4175809c2aa67e16ac9e reported consensus state change. New consensus 
state: current_term: 1 leader_uuid: "edc906d1a1ff4df99ec4a2b2c9985992" config { 
opid_index: 1997024 local: false peers { permanent_uuid: 
"edc906d1a1ff4df99ec4a2b2c9985992" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st212.bj" port: 18700 } } peers { permanent_uuid: 
"a8f15f0482e44a808d681f0aaecdf713" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st216.bj" port: 18700 } } }
I0403 21:12:20.004144 13949 catalog_manager.cc:2462] T 
00000000000000000000000000000000 P 13c6069ec07b453bb6b76d6ffc1d6242: Deleting 
tablet 6a32cfa0353e4175809c2aa67e16ac9e on peer 
bfdd1dd0797b4a66b25e6ff2758ceeec with delete type TABLET_DATA_TOMBSTONED (TS 
bfdd1dd0797b4a66b25e6ff2758ceeec not found in new config with opid_index 
1997024)
I0403 21:12:20.004248 13949 catalog_manager.cc:2489] Started AddServer task for 
tablet 6a32cfa0353e4175809c2aa67e16ac9e
W0403 21:12:20.004575 19954 catalog_manager.cc:1888] TS 
bfdd1dd0797b4a66b25e6ff2758ceeec: Delete Tablet RPC failed for tablet 
6a32cfa0353e4175809c2aa67e16ac9e: Network error: Client connection negotiation 
failed: client connection to 10.108.72.37:18700: connect: Connection refused 
(error 111)
I0403 21:12:20.004644 19954 catalog_manager.cc:1945] Scheduling retry of 
6a32cfa0353e4175809c2aa67e16ac9e Delete Tablet RPC for 
TS=bfdd1dd0797b4a66b25e6ff2758ceeec with a delay of 53ms (attempt = 1)...
I0403 21:12:20.007455 19954 catalog_manager.cc:2387] AddServer ChangeConfig RPC 
for tablet 6a32cfa0353e4175809c2aa67e16ac9e on peer 
edc906d1a1ff4df99ec4a2b2c9985992 with cas_config_opid_index 1997024: Change 
config succeeded
I0403 21:12:20.007561 13950 catalog_manager.cc:1560] Tablet: 
6a32cfa0353e4175809c2aa67e16ac9e reported consensus state change. New consensus 
state: current_term: 1 leader_uuid: "edc906d1a1ff4df99ec4a2b2c9985992" config { 
opid_index: 1997025 local: false peers { permanent_uuid: 
"edc906d1a1ff4df99ec4a2b2c9985992" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st212.bj" port: 18700 } } peers { permanent_uuid: 
"a8f15f0482e44a808d681f0aaecdf713" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st216.bj" port: 18700 } } peers { permanent_uuid: 
"39195da2a7cf4472aeadbf0b4c88a67a" member_type: VOTER last_known_addr { host: 
"c3-hadoop-prc-st172.bj" port: 18700 } } }
W0403 21:12:20.058163 19954 catalog_manager.cc:1888] TS 
bfdd1dd0797b4a66b25e6ff2758ceeec: Delete Tablet RPC failed for tablet 
6a32cfa0353e4175809c2aa67e16ac9e: Network error: Client connection negotiation 
failed: client connection to 10.108.72.37:18700: connect: Connection refused 
(error 111)
I0403 21:12:20.058200 19954 catalog_manager.cc:1945] Scheduling retry of 
6a32cfa0353e4175809c2aa67e16ac9e Delete Tablet RPC for 
TS=bfdd1dd0797b4a66b25e6ff2758ceeec with a delay of 36ms (attempt = 2)...
W0403 21:12:20.095010 19954 catalog_manager.cc:1888] TS 
bfdd1dd0797b4a66b25e6ff2758ceeec: Delete Tablet RPC failed for tablet 
6a32cfa0353e4175809c2aa67e16ac9e: Network error: Client connection negotiation 
failed: client connection to 10.108.72.37:18700: connect: Connection refused 
(error 111)
{noformat}


> 2 of 3 replica alive but failed to elect leader
> -----------------------------------------------
>
>                 Key: KUDU-1391
>                 URL: https://issues.apache.org/jira/browse/KUDU-1391
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: Binglin Chang
>         Attachments: 6a32cfa0353e4175809c2aa67e16ac9e.log.st212, 
> 6a32cfa0353e4175809c2aa67e16ac9e.log.st216
>
>
> Last weekend many TS have a lot too many open files error(haven't upgrade to 
> , when using our internal deploy tool to restart cluster (stop all ts, then 
> start all ts), the control machine have some issue which seems to block or 
> write to ssh terminal(maybe usb driver issue, not related to this bug), so 
> only half (about 30) of the TS is shutdown, then after maybe 10 minutes, I 
> switch to another control host and perform the whole restart. 
> Then I see writes are blocked, because 1 tablet is in no leader state, from 
> web-ui, 2 of  3 replicas is in follower state, 1 TABLET_DATA_TOMBSTONED, but 
> all election failed, will attach the log of the 2 followers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to