Pavel Voronkin created IGNITE-10679:
---------------------------------------

             Summary: Add more debug info for 'Affinity changes' PME stage.
                 Key: IGNITE-10679
                 URL: https://issues.apache.org/jira/browse/IGNITE-10679
             Project: Ignite
          Issue Type: Improvement
            Reporter: Pavel Voronkin


At current stage of PME time optimizations, next possible target to optimize is 
'Affinity changes' apply stage, because it starts to slowdown PME when 
configured number of partitions is increased (up to 32k) 

We need to add more debug information to understand what is the sources of 
slowdown.

Example log:
{noformat}

[15:21:55,326][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Received full 
message, will finish exchange [node=8ca2f878-70cf-4d34-a3ea-97b244f8d3d6, 
resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0]]
[15:21:55,725][INFO][sys-#121][CacheAffinitySharedManager] Affinity applying 
from full message performed in 398 ms.
[15:21:56,173][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=740ms, reason='timeout']
[15:21:57,150][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Skipping checkpoint (no pages were modified) [checkpointLockWait=13ms, 
checkpointLockHoldTime=699ms, reason='timeout']
[15:21:58,189][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Skipping checkpoint (no pages were modified) [checkpointLockWait=4ms, 
checkpointLockHoldTime=730ms, reason='timeout']
[15:21:58,340][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Affinity 
changes applied in 3013 ms.
[15:21:59,131][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Skipping checkpoint (no pages were modified) [checkpointLockWait=6ms, 
checkpointLockHoldTime=664ms, reason='timeout']
[15:21:59,311][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Full map 
updating for 67 groups performed in 971 ms.
[15:21:59,311][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Finish exchange 
future [startVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], 
resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], err=null]
[15:22:00,039][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Skipping checkpoint (no pages were modified) [checkpointLockWait=4ms, 
checkpointLockHoldTime=569ms, reason='timeout']
[15:22:00,108][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Detecting lost 
partitions performed in 796 ms.
[15:22:00,376][WARNING][sys-stripe-14-#15][finish] Received finish request for 
completed transaction (the message may be too late) [txId=GridCacheVersion 
[topVer=153677859, order=1542197978690, nodeOrder=10], dhtTxId=null, 
node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:00,376][WARNING][sys-stripe-9-#10][finish] Received finish request for 
completed transaction (the message may be too late) [txId=GridCacheVersion 
[topVer=153677859, order=1542197978695, nodeOrder=10], dhtTxId=null, 
node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:00,376][WARNING][sys-stripe-10-#11][GridDhtColocatedCache] 
<cache_group_2_043> Failed to acquire lock (transaction has been completed): 
GridCacheVersion [topVer=153677859, order=1542197978695, nodeOrder=10]
[15:22:00,376][WARNING][sys-stripe-11-#12][GridDhtColocatedCache] 
<cache_group_1_027> Failed to acquire lock (transaction has been completed): 
GridCacheVersion [topVer=153677859, order=1542197978690, nodeOrder=10]
[15:22:01,404][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Skipping checkpoint (no pages were modified) [checkpointLockWait=2ms, 
checkpointLockHoldTime=928ms, reason='timeout']
[15:22:02,069][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Checkpoint started [checkpointId=4a3ff280-b07c-48fc-8352-dd1b09e4ddbb, 
startPtr=FileWALPointer [idx=35, fileOff=11841470, len=11791007], 
checkpointLockWait=5ms, checkpointLockHoldTime=568ms, 
walCpRecordFsyncDuration=14ms, pages=12, reason='timeout']
[15:22:02,070][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Checkpoint finished [cpId=4a3ff280-b07c-48fc-8352-dd1b09e4ddbb, pages=12, 
markPos=FileWALPointer [idx=35, fileOff=11841470, len=11791007], 
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=584ms, 
pagesWrite=0ms, fsync=1ms, total=590ms]
[15:22:02,476][INFO][exchange-worker-#66][GridCachePartitionExchangeManager] 
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion 
[topVer=13, minorTopVer=0], evt=NODE_FAILED, 
node=a10011b8-7431-4bba-8b17-565a2a2be222]
[15:22:02,760][WARNING][sys-stripe-11-#12][finish] Received finish request for 
completed transaction (the message may be too late) [txId=GridCacheVersion 
[topVer=153677859, order=1542197978805, nodeOrder=10], dhtTxId=null, 
node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:03,044][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, 
checkpointLockHoldTime=557ms, reason='timeout']
[15:22:04,298][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager] 
Checkpoint started [checkpointId=4c2fd666-4fba-41a5-8ca8-d5a0306ed757, 
startPtr=FileWALPointer [idx=35, fileOff=23657694, len=11791007], 
checkpointLockWait=0ms, checkpointLockHoldTime=794ms, 
walCpRecordFsyncDuration=9ms, pages=6, reason='timeout']
{noformat}

{noformat}
Affinity applying from full message performed in 398 ms.
..
Affinity changes applied in 3013 ms.
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to