[
https://issues.apache.org/jira/browse/KUDU-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174093#comment-16174093
]
Andrew Wong commented on KUDU-2154:
-----------------------------------
>From hipchat, the logs are as follows:
{{W0920 16:47:36.332541 450 consensus_peers.cc:411] T
271df8901d98442cb478593babd8a609 P 20d4d86f182043398594b67492d13fdc -> Peer
c2ea8f22f4034bcc97e26c9236811960 (kudu513-1.gce.cloudera.com:7050): Couldn't
send request to peer c2ea8f22f4034bcc97e26c9236811960 for tablet
271df8901d98442cb478593babd8a609. Error code: TABLET_NOT_RUNNING (12). Status:
Illegal state: Tablet not RUNNING: STOPPED. Retrying in the next heartbeat
period. Already tried 691 times.}}
The fix for KUDU-1407 only evicted FAILED tablets. This is interesting though,
I was under the impression that if something went wrong, we'd end up at FAILED,
not STOPPED. Curious as to how it got stuck in STOPPED.
> Leader does not evict replica stuck in NOT_RUNNING state
> --------------------------------------------------------
>
> Key: KUDU-2154
> URL: https://issues.apache.org/jira/browse/KUDU-2154
> Project: Kudu
> Issue Type: Improvement
> Components: consensus
> Affects Versions: 1.5.0
> Reporter: Mike Percy
>
> The leader should be able to eventually evict a server stuck in a NOT_RUNNING
> state, in case the remote is malfunctioning. If the replica fails to Abort()
> a tablet copy, it will continue to report that it is NOT_RUNNING.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)