[ 
https://issues.apache.org/jira/browse/KUDU-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174093#comment-16174093
 ] 

Andrew Wong commented on KUDU-2154:
-----------------------------------

>From hipchat, the logs are as follows:
{{W0920 16:47:36.332541   450 consensus_peers.cc:411] T 
271df8901d98442cb478593babd8a609 P 20d4d86f182043398594b67492d13fdc -> Peer 
c2ea8f22f4034bcc97e26c9236811960 (kudu513-1.gce.cloudera.com:7050): Couldn't 
send request to peer c2ea8f22f4034bcc97e26c9236811960 for tablet 
271df8901d98442cb478593babd8a609. Error code: TABLET_NOT_RUNNING (12). Status: 
Illegal state: Tablet not RUNNING: STOPPED. Retrying in the next heartbeat 
period. Already tried 691 times.}}

The fix for KUDU-1407 only evicted FAILED tablets. This is interesting though, 
I was under the impression that if something went wrong, we'd end up at FAILED, 
not STOPPED. Curious as to how it got stuck in STOPPED.

> Leader does not evict replica stuck in NOT_RUNNING state
> --------------------------------------------------------
>
>                 Key: KUDU-2154
>                 URL: https://issues.apache.org/jira/browse/KUDU-2154
>             Project: Kudu
>          Issue Type: Improvement
>          Components: consensus
>    Affects Versions: 1.5.0
>            Reporter: Mike Percy
>
> The leader should be able to eventually evict a server stuck in a NOT_RUNNING 
> state, in case the remote is malfunctioning. If the replica fails to Abort() 
> a tablet copy, it will continue to report that it is NOT_RUNNING.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to