[ 
https://issues.apache.org/jira/browse/KUDU-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15451414#comment-15451414
 ] 

Todd Lipcon commented on KUDU-1408:
-----------------------------------

Been working on this the last few days. I did some changes that does the 
following:
- removes time-based retention for logs
- propagate the all_replicated index from leader to followers
- instead, retain up to 'max_log_segments' logs if they are needed to catch up 
any follower
- added some throttled logging about replica lag

I'm doing some tests with a high-throughput write on a 5-node cluster with 
max_log_segments set to 500. It seems like indeed sometimes one replica just 
can't keep up with the others. As soon as it starts lagging, the lag only gets 
worse and worse over time until it eventually has fallen 500 segments behind 
and a new copy gets created somewhere else.

I don't see an obvious reason why this replica is slower, though looking at 
top, it clearly is using way more CPU than another node which has a similar 
number of tablets. So, this investigation might lead somewhere unexpected :)

> Adding a replica may never succeed if copying tablet takes longer than the 
> log retention time
> ---------------------------------------------------------------------------------------------
>
>                 Key: KUDU-1408
>                 URL: https://issues.apache.org/jira/browse/KUDU-1408
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, tserver
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> Currently, while a remote bootstrap session is in progress, we anchor the 
> logs from the time at which it started. However, as soon as the session 
> finishes, we drop the anchor, and delete any logs. In the case where the 
> tablet copy itself takes longer than the log retention period, this means 
> it's likely to have a scenario like:
> - TS A starts downloading from TS B. It plans to download segments 1-4 and 
> adds an anchor.
> - TS B handles writes for 20 minutes, rolling the log many times (e.g. up to 
> log segment 20)
> - TS A finishes downloading, and ends the remote bootstrap session
> - TS B no longer has an anchor, so GCs all logs 1-16.
> - TS A finishes opening the tablet it just copied, but immediately is unable 
> to catch up (because it only has segments 1-4, but the leader only has 17-20)
> - TS B evicts TS A
> This loop will go on basically forever until the write workload stops on TS B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to