[ https://issues.apache.org/jira/browse/IGNITE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507667#comment-17507667 ]
Anton Kalashnikov commented on IGNITE-14197: -------------------------------------------- It is actually a good question. I remember that we discussed that but I can't find the decision about closing it. Perhaps, we expected to fix this problem in a different ticket but I don't see a linked ticket here as well. [~sergey-chugunov] or [~ibessonov] can you check how relevant is this task? and if it is we can reopen the PR > Checkpoint thread can't take checkpoint write lock because it waits for > parked threads to complete their work > ------------------------------------------------------------------------------------------------------------- > > Key: IGNITE-14197 > URL: https://issues.apache.org/jira/browse/IGNITE-14197 > Project: Ignite > Issue Type: Bug > Reporter: Anton Kalashnikov > Assignee: Anton Kalashnikov > Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > In case of enabled write throttling, when, for example, node parks data > streamer thread, it still holds checkpoint read lock and it leads to the long > pauses on waiting for checkpoint lock: > [2020-07-23 07:09:21,614][INFO > ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint > started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, > startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972], > checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*, > checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms, > walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms, > splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of > WAL without checkpoint'] > All operations at this moment are blocked. > Sometimes, it can lead to a complete disaster: > Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855* > {quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 > tid=0x00007f6161d6a800 nid=0xf932 waiting on condition [0x00007f5c292d1000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491) > at > org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483) > at > org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394) > at > org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369) > at > org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997) > at > org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473) > at > org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306) > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441) > at > org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770) > at > org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278) > at > org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139) > at > org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104) > at > org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) > at > org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559) > at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) > at java.lang.Thread.run(Thread.java:748) > {quote} -- This message was sent by Atlassian Jira (v8.20.1#820001)