[
https://issues.apache.org/jira/browse/IGNITE-20187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Lapin updated IGNITE-20187:
-------------------------------------
Description:
h3. Motivation
Prior to the implementation of the meta storage compaction and the related node
restart updates, the node restored its volatile state in terms of assignments
through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the
restart, the node was notified about missing state through {*}the events{*}.
However, it's no longer true: new logic assumes that the node will register
ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local
meta storage state for APPLIED_REVISION +X along with related processing. The
implementation of the above process is the essence of this ticket.
h3. Definition of Done
Within node restart process, TableManager or similar should manually read local
assignments pending keys (reading assignments stable will be covered in a
separate ticket) and schedule corresponding rebalance.
h3. Implementation Notes
It's possible that assignemnts.pending keys will be stale at the moment of
processing, so in order to overcome given issue following
common-for-current-rebalance steps are proposed:
# Start all new needed nodes {{partition.assignments.pending /
partition.assignments.stable}}
# After successful starts - check if current node is the leader of raft group
(leader response must be updated by current term), if it is
# Read distributed {{partition.assignments.pending }}and if the retrieved
revision is less or equal to the one retrieved within initial local read run
RaftGroupService#changePeersAsync(leaderTerm, peers)
RaftGroupService#changePeersAsync from old terms must be skipped.
Seems that
https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
should be also updated a bit.
was:
h3. Motivation
Prior to the implementation of the meta storage compaction and the related node
restart updates, the node restored its volatile state in terms of assignments
through ms.watches starting from APPLIED_REVISION + 1. Meaning that after the
restart, the node was notified about missing state through {*}the events{*}.
However, it's no longer true: new logic assumes that the node will register
ms.watch starting from APPLIED_REVISION + X + 1 and will manually read local
meta storage state for APPLIED_REVISION +X along with related processing. The
implementation of the above process is the essence of this ticket.
h3. Definition of Done
Within node restart process, TableManager or similar should manually read local
assignments pending keys (reading assignments stable will be covered in a
separate ticket) and schedule corresponding rebalance.
h3. Implementation Notes
It's possible that assignemnts.pending keys will be stale at the moment of
processing, so in order to overcome given issue following
common-for-current-rebalance steps are proposed:
# Start all new needed nodes {{partition.assignments.pending /
partition.assignments.stable}}
# After successful starts - check if current node is the leader of raft group
(leader response must be updated by current term), if it is
# Read distributed {{partition.assignments.pending }}and if the retrieved
revision is less or equal to the one retrieved within initial local read run{{
}}{{{}RaftGroupService#changePeersAsync(leaderTerm, peers){}}}{{{}.
{}}}{{RaftGroupService#changePeersAsync}}{{ from old terms must be skipped.}}
Seems that
https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
should be also updated a bit.
> Catch-up rebalance on node restart: assignments keys
> ----------------------------------------------------
>
> Key: IGNITE-20187
> URL: https://issues.apache.org/jira/browse/IGNITE-20187
> Project: Ignite
> Issue Type: Improvement
> Reporter: Alexander Lapin
> Priority: Major
> Labels: ignite-3
>
> h3. Motivation
> Prior to the implementation of the meta storage compaction and the related
> node restart updates, the node restored its volatile state in terms of
> assignments through ms.watches starting from APPLIED_REVISION + 1. Meaning
> that after the restart, the node was notified about missing state through
> {*}the events{*}. However, it's no longer true: new logic assumes that the
> node will register ms.watch starting from APPLIED_REVISION + X + 1 and will
> manually read local meta storage state for APPLIED_REVISION +X along with
> related processing. The implementation of the above process is the essence of
> this ticket.
> h3. Definition of Done
> Within node restart process, TableManager or similar should manually read
> local assignments pending keys (reading assignments stable will be covered in
> a separate ticket) and schedule corresponding rebalance.
> h3. Implementation Notes
> It's possible that assignemnts.pending keys will be stale at the moment of
> processing, so in order to overcome given issue following
> common-for-current-rebalance steps are proposed:
> # Start all new needed nodes {{partition.assignments.pending /
> partition.assignments.stable}}
> # After successful starts - check if current node is the leader of raft
> group (leader response must be updated by current term), if it is
> # Read distributed {{partition.assignments.pending }}and if the retrieved
> revision is less or equal to the one retrieved within initial local read run
> RaftGroupService#changePeersAsync(leaderTerm, peers)
> RaftGroupService#changePeersAsync from old terms must be skipped.
> Seems that
> https://github.com/apache/ignite-3/blob/main/modules/table/tech-notes/rebalance.md
> should be also updated a bit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)