[
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561092#comment-16561092
]
Maxim Muzafarov commented on IGNITE-7165:
-----------------------------------------
h5. Changes ready
* TC: [#3025 (27 Jul 18
20:00)|https://ci.ignite.apache.org/viewLog.html?buildId=1554633&tab=buildResultsDiv&buildTypeId=IgniteTests24Java8_RunAll]
* PR: [#4442|https://github.com/apache/ignite/pull/4442]
* Upsource:
[IGNT-CR-699|https://reviews.ignite.apache.org/ignite/review/IGNT-CR-699]
h5. Implementation details
# _Keep rebalance version_
Now it's not the last affinity topology version. To calculate affinity
assignment difference with the last topology version we should save version on
which rebalance is being currently running. Keep it in exchange thread.
# _{{LEFT\FAIL}} events triggers rebalance_
Each cache group have collection of supplier nodes to be retrieved for
partitions -- {{Map<> remaining}}. If some nodes of this collection
{{LEFT\FAIL}} cluster, rebalance must be restarted.
# _{{onLocalJoin}} event triggers rebalance_
Partition state changed OWNING → MOVING on coordinator due to obsolete
partititon update counter. Coordinator performs PME and after megre all
SingleMessages marks partitions with obsolete update sequence to be demanded
from remote nodes (by change OWNING -> MOVING partition state).
# _{{empty}} affinity history triggers rebalance_
Cache group can be started much later (not at local join event). So, this
cache group wouldn't have affinity history to compare with latest affinity.
# _Clear suppy contex map changed_
Previously, supply context map have been cleared after each topology version
change occurs. Since we can preform rebalance not on the latest topology
version this behavior should be changed. Clear context only for nodes
left\failed from topology.
# _topologyChanged() method new condition_
PME prepares partition to be {{RENTED}} or {{EVICTED}} if they are not assign
on local node regarding new affinity calculation. Processing stale supply
message (on previous versions) can lead to exceptions with getting partitions
on local node with incorrect state. Thats why stale
{{GridDhtPartitionSupplyMessage}} must be ignored by {{Demander}}.
# _REPLICATED cache processing_
Affinity assignment for this type of cache groups always not changed.
> Re-balancing is cancelled if client node joins
> ----------------------------------------------
>
> Key: IGNITE-7165
> URL: https://issues.apache.org/jira/browse/IGNITE-7165
> Project: Ignite
> Issue Type: Bug
> Reporter: Mikhail Cherkasov
> Assignee: Maxim Muzafarov
> Priority: Critical
> Labels: rebalance
> Fix For: 2.7
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
> Added new node to topology: TcpDiscoveryNode
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1,
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0,
> /172.31.16.213:0], discPort=0, order=36, intOrder=24,
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe,
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
> Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef,
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
> Finish exchange future [startVer=AffinityTopologyVersion [topVer=36,
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
> Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
> [topVer=36, minorTopVer=0], evt=NODE_JOINED,
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
> Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
> Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
> Rebalancing started [top=null, evt=NODE_JOINED,
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
> Starting rebalancing [mode=ASYNC,
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18,
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
> Starting rebalancing [mode=ASYNC,
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15,
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
> Starting rebalancing [mode=ASYNC,
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15,
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
> Starting rebalancing [mode=ASYNC,
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12,
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
> Starting rebalancing [mode=ASYNC,
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11,
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
> Starting rebalancing [mode=ASYNC,
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18,
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0],
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join
> events this means that a new server will never receive its partitions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)