[
https://issues.apache.org/jira/browse/IGNITE-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amelchev Nikita updated IGNITE-15364:
-------------------------------------
Release Note: Fixed rebalance issue when historical rebalancing is
reassigned after the client node joined the cluster. (was: Fixed rebalance
issue.)
> The rebalancing can be broken if historical rebalancing is reassigned after
> the client node joined the cluster.
> ---------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-15364
> URL: https://issues.apache.org/jira/browse/IGNITE-15364
> Project: Ignite
> Issue Type: Bug
> Reporter: Vyacheslav Koptilin
> Assignee: Vyacheslav Koptilin
> Priority: Major
> Fix For: 2.13
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Looks like the following scenario can break data consistency after
> rebalancing:
> - start and activate the cluster of three server nodes
> - create a cache with two backups and fill initial data into it
> - stop one server node and upload additional data to the cache in order to
> trigger historical rebalance after the node returns to the cluster
> - restart the node. make sure that historical rebalancing is started from
> two other nodes.
> - before rebalancing is completed a new client node should be started and
> joined the cluster. this leads to clean up partition update counters on
> server nodes, i.e. _GridDhtPartitionTopologyImpl#cntrMap_. ( * )
> - historical rebalancing from one node fails.
> - in that case, rebalancing is reassigned and starting node tries to
> rebalance missed partitions from another node.
> unfortunately, update counters for historical rebalance cannot be properly
> calculated due to ( * )
> An additional issue that was found while debugging:
> RebalanceReassignExchangeTask is skipped under some circumstances
> {code:java|title=GridCachePartitionExchangeManager.ExchangeWorker#body0}
> else if (lastAffChangedVer.after(exchId.topologyVersion())) {
> // There is a new exchange which should trigger rebalancing.
> // This reassignment request can be skipped.
> if (log.isInfoEnabled()) {
> log.info("Partitions reassignment request skipped due
> to affinity was already changed" +
> " [reassignTopVer=" + exchId.topologyVersion() +
> ", lastAffChangedTopVer=" + lastAffChangedVer +
> ']');
> }
> {code}
> There could be cases when the current rebalance is not canceled on PME which
> updates only minor versions and then triggers _RebalanceReassignExchangeTask_
> due to missed partitions on the supplier. After that,
> _RebalanceReassignExchangeTask_ is skipped, as the current minor version is
> higher than rebalance topology version, which leads to the situation when
> instances of missed partitions on demander remain in MOVING state until next
> PME that will trigger another rebalance.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)