[ 
https://issues.apache.org/jira/browse/KUDU-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386916#comment-15386916
 ] 

Todd Lipcon commented on KUDU-1407:
-----------------------------------

Actually it seems like this isn't an issue, but a sort of "inverse" is an 
issue: if a tablet _fails_ to start up, it will be _stuck_ in 
TABLET_NOT_RUNNING state (because the state is FAILED). In that case, we 
_should_ evict the old replica in order to repair it, most likely.

> Leader should not evict a follower when the follower is in the process of 
> starting up a tablet
> ----------------------------------------------------------------------------------------------
>
>                 Key: KUDU-1407
>                 URL: https://issues.apache.org/jira/browse/KUDU-1407
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> It seems like, if the leader gets an error from one of its followers because 
> the tablet is not running, it considers this replica to be 'unresponsive'. If 
> this happens for 5 minutes, it will evict that follower to try to create a 
> new replica.
> This can cause problems at cluster startup time when there is a lot of data 
> and a cold disk cache - the startup bootstrap process might be more than five 
> minutes and leaders might end up evicting followers that are perfectly 
> healthy (just in the process of coming up).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to