Pavel Voronkin created IGNITE-10679:
---------------------------------------
Summary: Add more debug info for 'Affinity changes' PME stage.
Key: IGNITE-10679
URL: https://issues.apache.org/jira/browse/IGNITE-10679
Project: Ignite
Issue Type: Improvement
Reporter: Pavel Voronkin
At current stage of PME time optimizations, next possible target to optimize is
'Affinity changes' apply stage, because it starts to slowdown PME when
configured number of partitions is increased (up to 32k)
We need to add more debug information to understand what is the sources of
slowdown.
Example log:
{noformat}
[15:21:55,326][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Received full
message, will finish exchange [node=8ca2f878-70cf-4d34-a3ea-97b244f8d3d6,
resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0]]
[15:21:55,725][INFO][sys-#121][CacheAffinitySharedManager] Affinity applying
from full message performed in 398 ms.
[15:21:56,173][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms,
checkpointLockHoldTime=740ms, reason='timeout']
[15:21:57,150][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=13ms,
checkpointLockHoldTime=699ms, reason='timeout']
[15:21:58,189][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=4ms,
checkpointLockHoldTime=730ms, reason='timeout']
[15:21:58,340][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Affinity
changes applied in 3013 ms.
[15:21:59,131][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=6ms,
checkpointLockHoldTime=664ms, reason='timeout']
[15:21:59,311][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Full map
updating for 67 groups performed in 971 ms.
[15:21:59,311][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Finish exchange
future [startVer=AffinityTopologyVersion [topVer=13, minorTopVer=0],
resVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], err=null]
[15:22:00,039][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=4ms,
checkpointLockHoldTime=569ms, reason='timeout']
[15:22:00,108][INFO][sys-#121][GridDhtPartitionsExchangeFuture] Detecting lost
partitions performed in 796 ms.
[15:22:00,376][WARNING][sys-stripe-14-#15][finish] Received finish request for
completed transaction (the message may be too late) [txId=GridCacheVersion
[topVer=153677859, order=1542197978690, nodeOrder=10], dhtTxId=null,
node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:00,376][WARNING][sys-stripe-9-#10][finish] Received finish request for
completed transaction (the message may be too late) [txId=GridCacheVersion
[topVer=153677859, order=1542197978695, nodeOrder=10], dhtTxId=null,
node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:00,376][WARNING][sys-stripe-10-#11][GridDhtColocatedCache]
<cache_group_2_043> Failed to acquire lock (transaction has been completed):
GridCacheVersion [topVer=153677859, order=1542197978695, nodeOrder=10]
[15:22:00,376][WARNING][sys-stripe-11-#12][GridDhtColocatedCache]
<cache_group_1_027> Failed to acquire lock (transaction has been completed):
GridCacheVersion [topVer=153677859, order=1542197978690, nodeOrder=10]
[15:22:01,404][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=2ms,
checkpointLockHoldTime=928ms, reason='timeout']
[15:22:02,069][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=4a3ff280-b07c-48fc-8352-dd1b09e4ddbb,
startPtr=FileWALPointer [idx=35, fileOff=11841470, len=11791007],
checkpointLockWait=5ms, checkpointLockHoldTime=568ms,
walCpRecordFsyncDuration=14ms, pages=12, reason='timeout']
[15:22:02,070][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=4a3ff280-b07c-48fc-8352-dd1b09e4ddbb, pages=12,
markPos=FileWALPointer [idx=35, fileOff=11841470, len=11791007],
walSegmentsCleared=0, walSegmentsCovered=[], markDuration=584ms,
pagesWrite=0ms, fsync=1ms, total=590ms]
[15:22:02,476][INFO][exchange-worker-#66][GridCachePartitionExchangeManager]
Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion
[topVer=13, minorTopVer=0], evt=NODE_FAILED,
node=a10011b8-7431-4bba-8b17-565a2a2be222]
[15:22:02,760][WARNING][sys-stripe-11-#12][finish] Received finish request for
completed transaction (the message may be too late) [txId=GridCacheVersion
[topVer=153677859, order=1542197978805, nodeOrder=10], dhtTxId=null,
node=ca22eac2-862a-4fed-a84b-4abe85bfab25, commit=false]
[15:22:03,044][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms,
checkpointLockHoldTime=557ms, reason='timeout']
[15:22:04,298][INFO][db-checkpoint-thread-#85][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=4c2fd666-4fba-41a5-8ca8-d5a0306ed757,
startPtr=FileWALPointer [idx=35, fileOff=23657694, len=11791007],
checkpointLockWait=0ms, checkpointLockHoldTime=794ms,
walCpRecordFsyncDuration=9ms, pages=6, reason='timeout']
{noformat}
{noformat}
Affinity applying from full message performed in 398 ms.
..
Affinity changes applied in 3013 ms.
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)