[
https://issues.apache.org/jira/browse/KUDU-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066546#comment-18066546
]
Marton Greber commented on KUDU-3752:
-------------------------------------
[~aserbin] mentioned in another channel the following: "for Kudu masters we
could enable a special WAL segment anchoring mode where segments can be GC-ed
only if all of the system catalog tablet replicas are caught up to the GC
cut-off. "
^ which is I think worth to consider.
> No auto-recovery when WAL fallback target is also GC'd
> ------------------------------------------------------
>
> Key: KUDU-3752
> URL: https://issues.apache.org/jira/browse/KUDU-3752
> Project: Kudu
> Issue Type: Bug
> Reporter: Marton Greber
> Priority: Major
>
> When a leader detects an LMP_MISMATCH with a follower, it attempts to
> reconcile
> by falling back to the follower's last committed index and replaying WAL from
> that point. If the WAL for that fallback index has already been garbage
> collected, Kudu correctly logs "the follower will never be able to catch up" —
> but then takes no corrective action. The LMP_MISMATCH status is retried in an
> infinite loop, and the follower remains permanently stuck.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)