[ 
https://issues.apache.org/jira/browse/IGNITE-23566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896771#comment-17896771
 ] 

Alexander Lapin commented on IGNITE-23566:
------------------------------------------

Sounds reasonable.

> Investigate possible races between resetPartitions and infinite rebalance 
> retries
> ---------------------------------------------------------------------------------
>
>                 Key: IGNITE-23566
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23566
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Kirill Gusakov
>            Assignee: Kirill Gusakov
>            Priority: Major
>              Labels: ignite-3
>
> *Motivation*
> For now our rebalance fail-over is a pretty trivial infinite loop of retries:
> - on the any issues on the catch up phase or later we call the 
> onReconfigurationError listener
> - for now this listener just count the retries and call 
> changePeersAndLearnersAsync logic again and again
> At the same time, we can call the resetPartitions logic and rewrite pending 
> assignments, potentially at the any moment. So, we can have a race between 
> rebalance retries and resetPartitions.
> *Definition of done*
> Under this ticket we need to investigate all possible issues, if any, and 
> create appropriate issues to resolve.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to