[ 
https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763633#comment-17763633
 ] 

Kirill Gusakov commented on IGNITE-20187:
-----------------------------------------

LGTM

> Catch-up rebalance on node restart: assignments keys
> ----------------------------------------------------
>
>                 Key: IGNITE-20187
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20187
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Assignee: Aleksandr Polovtcev
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> h3. Motivation
> Prior to the implementation of the meta storage compaction and the related 
> node restart updates, the node restored its volatile state in terms of 
> assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning 
> that after the restart, the node was notified about missing state through 
> {*}the events{*}. However, it's no longer true: new logic assumes that the 
> node will register ms.watch starting from APPLIED_REVISION + X + 1 and will 
> manually read local meta storage state for APPLIED_REVISION +X along with 
> related processing. The implementation of the above process is the essence of 
> this ticket.
> h3. Definition of Done
> Within node restart process, TableManager or similar should manually read 
> local assignments pending keys (reading assignments stable will be covered in 
> a separate ticket) and schedule corresponding rebalance.
> h3. Implementation Notes
> It's possible that assignemnts.pending keys will be stale at the moment of 
> processing, so in order to overcome given issue following 
> common-for-current-rebalance steps are proposed:
>  # Start all new needed nodes {{partition.assignments.pending / 
> partition.assignments.stable}}
>  # After successful starts - check if current node is the leader of raft 
> group (leader response must be updated by current term), if it is
>  # Read distributed \{{partition.assignments.pending}} and if the retrieved 
> revision is less or equal to the one retrieved within initial local read run 
> RaftGroupService#changePeersAsync(leaderTerm, peers) 
> RaftGroupService#changePeersAsync from old terms must be skipped.
> Seems that 
> [https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md]
>  should be also updated a bit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to