[
https://issues.apache.org/jira/browse/KUDU-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Percy updated KUDU-1096:
-----------------------------
Parent: KUDU-434
> Re-replication support for Kudu beta
> ------------------------------------
>
> Key: KUDU-1096
> URL: https://issues.apache.org/jira/browse/KUDU-1096
> Project: Kudu
> Issue Type: Sub-task
> Components: consensus
> Affects Versions: Feature Complete
> Reporter: Mike Percy
> Assignee: Mike Percy
> Priority: Critical
>
> We want to add initial support for re-replication for the beta release.
> Design:
> # When a leader detects that a follower has fallen behind to the point that
> it can't catch up, it will trigger a "remove server" config change.
> # When the master gets a report from a tablet and sees that the number of
> replicas in the config is less than the table's desired replication, it will
> itself start a task to create a new replica.
> Details:
> # Let's start with choosing randomly among any tservers that have a most
> recent heartbeat in the last 3 heartbeat periods, as a reasonable proxy for
> "live tservers". Later we can do something smarter like "power of two
> choices" or load-aware placement. Random placement isn't optimal, but also
> has the least risk of causing weird emergent behavior.
> # The master task will call AddServer() to add the newly selected replica.
> Additional possible refinements:
> # We should also trigger this same process if the leader detects that it
> hasn't had a successful request send to a follower after N heartbeat periods.
> # We should build in some safety net here in the case that the follower is
> actually still in the middle of bootstrapping and making progress - otherwise
> we could flap.
> # We probably want to prohibit the leader from doing this unless it knows
> it's still within its "lease period". Otherwise, we might too easily drop to
> a 1-node config if we get to a 2-node config and the leader itself has some
> issue.
> Pros:
> * Fairly simple and easy approach to re-replication.
> Cons:
> * Availability is less than optimal, for example if a follower is slow enough
> to fall behind the log, causing the leader to remove it from the Raft config,
> and there is a simultaneous leader failure (i.e. bad disk) on the leader,
> then administrator intervention will be required to bring the cluster back
> online since the only remaining replica will be unable to get elected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)