[ 
https://issues.apache.org/jira/browse/IGNITE-18640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-18640:
----------------------------------
    Reviewer: Vladislav Pyatkov

> Implement placement driver best-effort single actor selector and fail-over
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-18640
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18640
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Assignee: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> As a prerequisite, it's worth to mention that placement drive itself should 
> be reliable and have corresponding fail-over logic, meaning that placement 
> driver service should be distributed in a way that if one of its nodes fails 
> another one picks up the flag. On the other hand, despite the fact, that it's 
> valid to have more than one PD active actors (the one that will check 
> topology, send leaseGrant msg, etc) it's better to have one only in order to 
> reduce the amount of unnecessary calculations, messaging duplication and so 
> on. So, to sum up:
>  * PD may work on top of meta storage, using it as a consensus provider.
>  * There may be more than one active PD actors, that try to evaluate primary 
> replica along with corresponding lease, send leaseGrant msg, etc, meaning 
> that actions should be idempotent or that we should have an ability to skip 
> stale/concurrent triggers.
>  * It worth to have at least best-effort single actor selection logic.
> h3. Definition of Done
>  * Almost always (because of best-effort nature, it's not always) there's 
> only one PD active actor if there's a majority in ms group.
>  * If for some reason active actor fails, another one will picks up the flag 
> as fast as possible.
>  * It's still valid to have multiple active actors at the same time. If you 
> guys have any ideas of how to implement not more than one actor, please share 
> them.
> h3. Implementation Notes
> Assuming that we have a distributed onLeaderElected(Peer leader, long term) 
> callback we may implement following logic on PlacementDriverManager#start()
>  * register ms.onLeaderElected()    
> {code:java}
> ms.onLeaderElected((leader, term) -> {
>         if (term > lastSeenTerm) {
>             if (leader.equlas(localNode)) {
>                 // Become an active actor.
>             } else {
>                 // Discard activeness. 
>             }
>         } else {
>             // No-op, just a stale update.
>         }
>     });{code}
>  * refreshLeader and to exact the same logic as the one mentioned above in 
> order to become and active actor if there already was a leader during 
> listener registration.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to