[ 
https://issues.apache.org/jira/browse/KUDU-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-1278:
-------------------------------------
    Target Version/s: 0.7.0  (was: 0.8.0)

> Tablets that take >5 minutes to copy will never remote bootstrap
> ----------------------------------------------------------------
>
>                 Key: KUDU-1278
>                 URL: https://issues.apache.org/jira/browse/KUDU-1278
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 0.6.0
>            Reporter: Todd Lipcon
>            Assignee: Binglin Chang
>            Priority: Blocker
>             Fix For: 0.7.0
>
>
> [~decster] and I debugged this issue on his cluster. One of the servers had 
> been shut down due to bad RAM, so it triggered remote bootstrap of all of its 
> tablets to create new replicas.
> During remote bootstrap, the leader replica continues to try to replicate 
> operations to the new follower, while it's in the process of bootstrapping. 
> This causes it to try to trigger remote bootstrap, which fails with a "Remote 
> bootstrap already in progress" error. The leader considers this to be an 
> unsuccessful communication with the follower. After 5 minutes of receiving 
> this error, it will decide that the follower is dead and evict it, and 
> request another new replica. When the previous replica finishes, it will find 
> out that it's been evicted, and delete everything it just copied. This cycle 
> repeats forever.
> We need to fix the leader so that, as long as the remote bootstrapping 
> replica is making progress, we don't consider it dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to