[
https://issues.apache.org/jira/browse/IGNITE-18640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Chudov updated IGNITE-18640:
----------------------------------
Reviewer: Vladislav Pyatkov
> Implement placement driver best-effort single actor selector and fail-over
> --------------------------------------------------------------------------
>
> Key: IGNITE-18640
> URL: https://issues.apache.org/jira/browse/IGNITE-18640
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexander Lapin
> Assignee: Denis Chudov
> Priority: Major
> Labels: ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. Motivation
> As a prerequisite, it's worth to mention that placement drive itself should
> be reliable and have corresponding fail-over logic, meaning that placement
> driver service should be distributed in a way that if one of its nodes fails
> another one picks up the flag. On the other hand, despite the fact, that it's
> valid to have more than one PD active actors (the one that will check
> topology, send leaseGrant msg, etc) it's better to have one only in order to
> reduce the amount of unnecessary calculations, messaging duplication and so
> on. So, to sum up:
> * PD may work on top of meta storage, using it as a consensus provider.
> * There may be more than one active PD actors, that try to evaluate primary
> replica along with corresponding lease, send leaseGrant msg, etc, meaning
> that actions should be idempotent or that we should have an ability to skip
> stale/concurrent triggers.
> * It worth to have at least best-effort single actor selection logic.
> h3. Definition of Done
> * Almost always (because of best-effort nature, it's not always) there's
> only one PD active actor if there's a majority in ms group.
> * If for some reason active actor fails, another one will picks up the flag
> as fast as possible.
> * It's still valid to have multiple active actors at the same time. If you
> guys have any ideas of how to implement not more than one actor, please share
> them.
> h3. Implementation Notes
> Assuming that we have a distributed onLeaderElected(Peer leader, long term)
> callback we may implement following logic on PlacementDriverManager#start()
> * register ms.onLeaderElected()
> {code:java}
> ms.onLeaderElected((leader, term) -> {
> if (term > lastSeenTerm) {
> if (leader.equlas(localNode)) {
> // Become an active actor.
> } else {
> // Discard activeness.
> }
> } else {
> // No-op, just a stale update.
> }
> });{code}
> * refreshLeader and to exact the same logic as the one mentioned above in
> order to become and active actor if there already was a leader during
> listener registration.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)