[ 
https://issues.apache.org/jira/browse/KUDU-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-1408:
------------------------------
    Target Version/s: 1.0.0  (was: 0.9.0)

> Adding a replica may never succeed if copying tablet takes longer than the 
> log retention time
> ---------------------------------------------------------------------------------------------
>
>                 Key: KUDU-1408
>                 URL: https://issues.apache.org/jira/browse/KUDU-1408
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, tserver
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> Currently, while a remote bootstrap session is in progress, we anchor the 
> logs from the time at which it started. However, as soon as the session 
> finishes, we drop the anchor, and delete any logs. In the case where the 
> tablet copy itself takes longer than the log retention period, this means 
> it's likely to have a scenario like:
> - TS A starts downloading from TS B. It plans to download segments 1-4 and 
> adds an anchor.
> - TS B handles writes for 20 minutes, rolling the log many times (e.g. up to 
> log segment 20)
> - TS A finishes downloading, and ends the remote bootstrap session
> - TS B no longer has an anchor, so GCs all logs 1-16.
> - TS A finishes opening the tablet it just copied, but immediately is unable 
> to catch up (because it only has segments 1-4, but the leader only has 17-20)
> - TS B evicts TS A
> This loop will go on basically forever until the write workload stops on TS B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to