[ 
https://issues.apache.org/jira/browse/IGNITE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507667#comment-17507667
 ] 

Anton Kalashnikov commented on IGNITE-14197:
--------------------------------------------

It is actually a good question. I remember that we discussed that but I can't 
find the decision about closing it. Perhaps, we expected to fix this problem in 
a different ticket but I don't see a linked ticket here as well.

[~sergey-chugunov] or [~ibessonov] can you check how relevant is this task? and 
if it is we can reopen the PR

> Checkpoint thread can't take checkpoint write lock because it waits for 
> parked threads to complete their work
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-14197
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14197
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Anton Kalashnikov
>            Assignee: Anton Kalashnikov
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In case of enabled write throttling, when, for example, node parks data 
> streamer thread, it still holds checkpoint read lock and it leads to the long 
> pauses on waiting for checkpoint lock:
> [2020-07-23 07:09:21,614][INFO 
> ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint 
> started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, 
> startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972], 
> checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*, 
> checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms, 
> walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms, 
> splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of 
> WAL without checkpoint']
> All operations at this moment are blocked.
> Sometimes, it can lead to a complete disaster:
> Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
> {quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 
> tid=0x00007f6161d6a800 nid=0xf932 waiting on condition [0x00007f5c292d1000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
> at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770)
> at 
> org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278)
> at 
> org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139)
> at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104)
> at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> at java.lang.Thread.run(Thread.java:748)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to