denis-chudov commented on code in PR #1676:
URL: https://github.com/apache/ignite-3/pull/1676#discussion_r1114229495


##########
modules/distribution-zones/tech-notes/rebalance.md:
##########
@@ -55,18 +55,62 @@ But for the new one we have an idea, which doesn't need the 
metastore at all:
 - On rebalanceDone/rebalanceError/leaderElected events the local event 
listener send a message to PrimaryReplica with the description of event
 - If PrimaryReplica is not available - we should retry send, until leader 
didn't find himself outdated (in this case, new leader will send leaderElected 
event to PrimaryReplica and receives the rebalance request again.
 
-### 4. Stop the redundant replicas and update replicas clients
+### 4. Update the rebalance state after successful rebalance
+Within the single atomic metastore invoke we must update the keys according to 
the following pseudo-code:
+```
+    metastoreInvoke: \\ atomic
+        zoneId.assignment.stable = newPeers
+        remove(zoneId.assignment.cancel)
+        if empty(zoneId.assignment.planned):
+            zoneId.assignment.pending = empty
+        else:
+            zoneId.assignment.pending = zoneId.assignment.planned
+            remove(zoneId.assignment.planned)
+```
+About the `*.cancel` key you can read 
[below](#cancel-an-ongoing-rebalance-process-if-needed)
+
+### 5. Stop the redundant replicas and update replicas clients
 Here we need to:
 - Stop the redundant replicas, which is not in the current stable assignments
   - We can accidentally stop PrimaryReplica, so we need to use the algorithm 
of a graceful PrimaryReplica transfer, if needed.
 - Update the replication protocol clients (RaftGroupService, for example) on 
each Replica.
 
-### Failover logic
+## Failover logic
 The main idea of failover process: every rebalance request 
PlacementDriver->PrimaryReplica or PrimaryReplica->ReplicationGroup must be 
idempotent. So, redundant request in the worst case should be just answered by 
positive answer (just like rebalance is already done).
 
 After that we can prepare the following logic:
-- On every new PD leader elected - it must check the direct value (not the 
locally cached one) of `zoneId.assignment.pending` keys and send 
RebalanceRequest to needed PrimaryReplicas and then listen updates from the 
last revision.
-- On every PrimaryReplica reelection by PD it must send the new 
RebalanceRequest to PrimaryReplica, if pending key is not empty. 
-- On every leader reelection (for the leader oriented protocols) inside the 
replication group - leader send leaderElected event to PrimaryReplica, which 
force PrimaryReplica to send RebalanceRequest to the replication group leader 
again.
+- On every new PD leader elected - it must check the direct value (not the 
locally cached one) of `zoneId.assignment.pending`/`zondeId.assignment.cancel` 
(the last one always wins, if exists) keys and send 
`RebalanceRequest`/`CancelRebalanceRequest` to needed PrimaryReplicas and then 
listen updates from the last revision of this key.
+- On every PrimaryReplica reelection by PD it must send the new 
`RebalanceRequest`/`CancelRebalanceRequest` to PrimaryReplica, if 
pending/cancel (cancel always wins, if filled) key is not empty. 
+- On every leader reelection (for the leader oriented protocols) inside the 
replication group - leader sends leaderElected event to PrimaryReplica, which 
forces PrimaryReplica to send RebalanceRequest/CancelRebalanceRequest to the 
replication group leader again.
+
+More over: 
+- `RebalanceRequest`/`CancelRebalanceRequest` must include the revision of 
its' trigger. 
+- PrimaryReplica must persist the last seen revision locally.
+- When new PrimaryReplica elected, PlacementDriver must initialize the last 
seen revision of PrimaryReplica to the current revision-1. So, after that 
PlacementDriver must send the *Request with current actual revision.

Review Comment:
   ```suggestion
   - When new PrimaryReplica elected, PlacementDriver must initialize the last 
seen revision of PrimaryReplica to the `currentRevision-1`. So, after that 
PlacementDriver must send the *Request with current actual revision.
   ```



##########
modules/distribution-zones/tech-notes/rebalance.md:
##########
@@ -55,18 +55,62 @@ But for the new one we have an idea, which doesn't need the 
metastore at all:
 - On rebalanceDone/rebalanceError/leaderElected events the local event 
listener send a message to PrimaryReplica with the description of event
 - If PrimaryReplica is not available - we should retry send, until leader 
didn't find himself outdated (in this case, new leader will send leaderElected 
event to PrimaryReplica and receives the rebalance request again.
 
-### 4. Stop the redundant replicas and update replicas clients
+### 4. Update the rebalance state after successful rebalance
+Within the single atomic metastore invoke we must update the keys according to 
the following pseudo-code:
+```
+    metastoreInvoke: \\ atomic
+        zoneId.assignment.stable = newPeers
+        remove(zoneId.assignment.cancel)
+        if empty(zoneId.assignment.planned):
+            zoneId.assignment.pending = empty
+        else:
+            zoneId.assignment.pending = zoneId.assignment.planned
+            remove(zoneId.assignment.planned)
+```
+About the `*.cancel` key you can read 
[below](#cancel-an-ongoing-rebalance-process-if-needed)
+
+### 5. Stop the redundant replicas and update replicas clients
 Here we need to:
 - Stop the redundant replicas, which is not in the current stable assignments
   - We can accidentally stop PrimaryReplica, so we need to use the algorithm 
of a graceful PrimaryReplica transfer, if needed.
 - Update the replication protocol clients (RaftGroupService, for example) on 
each Replica.
 
-### Failover logic
+## Failover logic
 The main idea of failover process: every rebalance request 
PlacementDriver->PrimaryReplica or PrimaryReplica->ReplicationGroup must be 
idempotent. So, redundant request in the worst case should be just answered by 
positive answer (just like rebalance is already done).

Review Comment:
   ```suggestion
   The main idea of failover process: every rebalance request and cancel 
rebalance request PlacementDriver->PrimaryReplica or 
PrimaryReplica->ReplicationGroup must be idempotent. So, redundant request in 
the worst case should be just answered by positive answer (just like rebalance 
is already done).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to