[jira] [Issue Comment Deleted] (IGNITE-7165) Re-balancing is cancelled if client node joins

Dmitry Sherstobitov (JIRA) Tue, 14 Aug 2018 03:00:27 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-7165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dmitry Sherstobitov updated IGNITE-7165:
----------------------------------------
    Comment: was deleted

(was: I'm afraid I cannot give you correct reproducer on Java

Attached log from node with cleared LFS [^node-NO_REBALANCE-7165.log]

There is some messaged with "Skipping rebalancing (no affinity changes)" after 
node joins cluster while in previous version following text appears in log

{code:java}
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Topology 
snapshot [ver=18, servers=4, clients=0, CPUs=32, offheap=75.0GB, heap=120.0GB]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- Node 
[id=61E12BC1-31A0-473A-BF79-DDD51C879722, clusterState=ACTIVE]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- 
Baseline [id=0, size=4, online=4, offline=0]
[12:53:44,127][INFO][disco-event-worker-#61][GridDiscoveryManager] Data Regions 
Configured:
[12:53:44,128][INFO][disco-event-worker-#61][GridDiscoveryManager]   ^-- 
default [initSize=256.0 MiB, maxSize=18.8 GiB, persistenceEnabled=true]
[12:53:44,128][INFO][exchange-worker-#62][time] Started exchange init 
[topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false, 
evt=NODE_FAILED, evtNode=02e72065-13c8-4b47-a905-874d723cc3c1, customEvt=null, 
allowMerge=true]
[12:53:44,129][INFO][exchange-worker-#62][GridDhtPartitionsExchangeFuture] 
Finish exchange future [startVer=AffinityTopologyVersion [topVer=18, 
minorTopVer=0], resVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], 
err=null]
[12:53:44,130][INFO][exchange-worker-#62][time] Finished exchange init 
[topVer=AffinityTopologyVersion [topVer=18, minorTopVer=0], crd=false]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_1_028], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_3_088], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_1_015], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_4_118], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext 
[grp=cache_group_2_058], topVer=AffinityTopologyVersion [topVer=17, 
minorTopVer=0], rebalanceId=6]
[12:53:44,141][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_6], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_5], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_4], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_3], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_2], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Cancelled 
rebalancing from all nodes [topology=AffinityTopologyVersion [topVer=17, 
minorTopVer=0]]
[12:53:44,142][INFO][exchange-worker-#62][GridDhtPartitionDemander] Completed 
rebalance future: RebalanceFuture [grp=CacheGroupContext [grp=cache_group_1], 
topVer=AffinityTopologyVersion [topVer=17, minorTopVer=0], rebalanceId=6]
[12:53:44,143][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] 
Rebalancing scheduled [order=[ignite-sys-cache, cache_group_2_031, 
cache_group_1, cache_group_2, cache_group_3, cache_group_4, cache_group_5, 
cache_group_6, cache_group_2_058, cache_group_4_118, cache_group_1_015, 
cache_group_3_088, cache_group_1_028]]
[12:53:44,143][INFO][exchange-worker-#62][GridCachePartitionExchangeManager] 
Rebalancing started [top=AffinityTopologyVersion [topVer=18, minorTopVer=0], 
evt=NODE_FAILED, node=02e72065-13c8-4b47-a905-874d723cc3c1]
[12:53:44,143][INFO][exchange-worker-#62][GridDhtPartitionDemander] Starting 
rebalancing [grp=ignite-sys-cache, mode=SYNC, 
fromNode=2cb374ac-8fff-4235-8149-2d05d629c3d2, partitionsCount=25, 
topology=AffinityTopologyVersion [topVer=18, minorTopVer=0], rebalanceId=9]
{code}
 

 )

> Re-balancing is cancelled if client node joins
> ----------------------------------------------
>
>                 Key: IGNITE-7165
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7165
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Cherkasov
>            Assignee: Maxim Muzafarov
>            Priority: Critical
>              Labels: rebalance
>             Fix For: 2.7
>
>         Attachments: node-NO_REBALANCE-7165.log
>
>
> Re-balancing is canceled if client node joins. Re-balancing can take hours 
> and each time when client node joins it starts again:
> [15:10:05,700][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Added new node to topology: TcpDiscoveryNode 
> [id=979cf868-1c37-424a-9ad1-12db501f32ef, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 
> 172.31.16.213], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, 
> /172.31.16.213:0], discPort=0, order=36, intOrder=24, 
> lastExchangeTime=1512907805688, loc=false, ver=2.3.1#20171129-sha1:4b1ec0fe, 
> isClient=true]
> [15:10:05,701][INFO][disco-event-worker-#61%statement_grid%][GridDiscoveryManager]
>  Topology snapshot [ver=36, servers=7, clients=5, CPUs=128, heap=160.0GB]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Started 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false, evt=NODE_JOINED, evtNode=979cf868-1c37-424a-9ad1-12db501f32ef, 
> customEvt=null, allowMerge=true]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionsExchangeFuture]
>  Finish exchange future [startVer=AffinityTopologyVersion [topVer=36, 
> minorTopVer=0], resVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> err=null]
> [15:10:05,702][INFO][exchange-worker-#62%statement_grid%][time] Finished 
> exchange init [topVer=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> crd=false]
> [15:10:05,703][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
> [topVer=36, minorTopVer=0], evt=NODE_JOINED, 
> node=979cf868-1c37-424a-9ad1-12db501f32ef]
> [15:10:08,706][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Cancelled rebalancing from all nodes [topology=AffinityTopologyVersion 
> [topVer=35, minorTopVer=0]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing scheduled [order=[statementp]]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridCachePartitionExchangeManager]
>  Rebalancing started [top=null, evt=NODE_JOINED, 
> node=a8be3c14-9add-48c3-b099-3fd304cfdbf4]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=2f6bde48-ffb5-4815-bd32-df4e57dc13e0, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,707][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=35d01141-4dce-47dd-adf6-a4f3b2bb9da9, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=b3a8be53-e61f-4023-a906-a265923837ba, partitionsCount=15, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=f825cb4e-7dcc-405f-a40d-c1dc1a3ade5a, partitionsCount=12, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=4ae1db91-8b88-4180-a84b-127a303959e9, partitionsCount=11, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> [15:10:08,708][INFO][exchange-worker-#62%statement_grid%][GridDhtPartitionDemander]
>  Starting rebalancing [mode=ASYNC, 
> fromNode=7c286481-7638-49e4-8c68-fa6aa65d8b76, partitionsCount=18, 
> topology=AffinityTopologyVersion [topVer=36, minorTopVer=0], 
> updateSeq=-1754630006]
> so in clusters with a big amount of data and the frequent client left/join 
> events this means that a new server will never receive its partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Issue Comment Deleted] (IGNITE-7165) Re-balancing is cancelled if client node joins

Reply via email to