kgusakov commented on code in PR #894:
URL: https://github.com/apache/ignite-3/pull/894#discussion_r904956328


##########
modules/table/tech-notes/rebalance.md:
##########
@@ -1,92 +1,80 @@
 # How to read this doc
 Every algorithm phase has the following main sections:
-- Trigger - how current phase will be invoked
-- Steps/Pseudocode - the main logical steps of the current phase
-- Result (optional, if pseudocode provided) - events and system state changes, 
which this phase produces
+- Trigger – how current phase will be invoked
+- Steps/Pseudocode – the main logical steps of the current phase
+- Result (optional, if pseudocode provided) – events and system state changes, 
which this phase produces
 
 # Rebalance algorithm
 ## Short algorithm description
   - Operations, which can trigger rebalance occurred:
     
-    Write new baseline to metastore (effectively from 1 node in cluster)
-
-    OR
-    
     Write new replicas configuration number to table config (effectively from 
1 node)
     
     OR
     
     Write new partitions configuration number to table config (effectively 
from 1 node)
 - Write new assignments' intention to metastore (effectively from 1 node in 
cluster)
-- Start new raft nodes. Initiate/update change peer request to raft group 
(effectively from 1 node per partition)
+- Start new raft nodes. Initiate/update asynchronous change peer request to 
raft group (effectively from 1 node per partition)
 - Stop all redundant nodes. Change stable partition assignment to the new one 
and finish rebalance process.
 
 ## New metastore keys
 For further steps, we should introduce some new metastore keys:
-- `partition.assignments.stable` - the list of peers, which process operations 
for partition at the current moment.
+- `partition.assignments.stable` - the list of peers, which process operations 
for a partition at the current moment.
 - `partition.assignments.pending` - the list of peers, where current rebalance 
move the partition.
 - `partition.assignments.planned` - the list of peers, which will be used for 
new rebalance, when current will be finished.
 
 Also, we will need the utility key:
-- `partition.assignments.change.trigger.revision` - the key, needed for 
processing the event about assignments' update trigger only once.
+- `partition.change.trigger.revision` - the key, needed for processing the 
event about assignments' update trigger only once.
 
 ## Operations, which can trigger rebalance
 Three types of events can trigger the rebalance:
-- Change of baseline metastore key (1 for all tables for now, but maybe it 
should be separate per table in future)
 - Configuration change through 
`org.apache.ignite.configuration.schemas.table.TableChange.changeReplicas` 
produce metastore update event
 - Configuration change through 
`org.apache.ignite.configuration.schemas.table.TableChange.changePartitions` 
produce metastore update event (IMPORTANT: this type of trigger has additional 
difficulties because of cross raft group data migration and it is out of scope 
of this document)
 
-**Result**: So, one of three metastore keys' changes will trigger rebalance:
+**Result**: So, one of two metastore keys' changes will trigger rebalance:
 ```
-<global>.baseline
 <tableScope>.replicas
 <tableScope>.partitions // out of scope
 ```
 ## Write new pending assignments (1)
 **Trigger**:
-- Metastore event about change in `<global>.baseline`
-- Metastore event about changes in `<tableScope>.replicas`
+- Metastore event about changes in `<tableScope>.replicas` (See 
`org.apache.ignite.internal.table.distributed.TableManager.onUpdateReplicas`)
 
 **Pseudocode**:
-```
-onBaselineEvent:
-    for table in tableCfg.tables():
-        for partition in table.partitions:
-            <inline metastoreInvoke>
-            
+```         
 onReplicaNumberChange:
     with table as event.table:
         for partitoin in table.partitions:
             <inline metastoreInvoke>
 
 metastoreInvoke: // atomic metastore call through multi-invoke api
-    if empty(partition.assignments.change.trigger.revision) || 
partition.assignments.change.trigger.revision < event.revision:
+    if empty(partition.change.trigger.revision) || 
partition.change.trigger.revision < event.revision:
         if empty(partition.assignments.pending) && 
partition.assignments.stable != calcPartAssighments():
             partition.assignments.pending = calcPartAssignments() 
-            partition.assignments.change.trigger.revision = event.revision
+            partition.change.trigger.revision = event.revision
         else:
             if partition.assignments.pending != calcPartAssignments
                 partition.assignments.planned = calcPartAssignments()
-                partition.assignments.change.trigger.revision = event.revision
+                partition.change.trigger.revision = event.revision
             else
                 remove(partition.assignments.planned)
     else:
         skip
 ```
 
 ## Start new raft nodes and initiate change peers (2)
-**Trigger**: Metastore event about new `partition.assignments.pending` received
+**Trigger**: Metastore event about new `partition.assignments.pending` 
received (See corresponding listener for pending key in 
`org.apache.ignite.internal.table.distributed.TableManager.registerRebalanceListeners`)
 
 **Steps**:
 - Start all new needed nodes `partition.assignments.pending / 
partition.assignments.stable` 
-- After successful starts - check if current node is the leader of raft group 
(leader response must be updated by current term) and `changePeers(leaderTerm, 
peers)`. `changePeers` from old terms must be skipped.
+- After successful starts - check if current node is the leader of raft group 
(leader response must be updated by current term) and run 
`RaftGroupService#changePeersAsync(leaderTerm, peers)`. 
`RaftGroupService#changePeersAsync` from old terms must be skipped.
 
 **Result**:
 - New needed raft nodes started
 - Change peers state initiated for every raft group
 
-## When changePeers done inside the raft group - stop all redundant nodes
-**Trigger**: When leader applied new Configuration with list of resulting 
peers `<applied peer>`, it calls `onChangePeersCommitted(<applied peers>)`
+## When RaftGroupService#changePeersAsync done inside the raft group - update 
stable key and stop all redundant nodes

Review Comment:
   Could you add some words about assignments' configuration update and further 
updates of raft clients for table?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to