[ 
https://issues.apache.org/jira/browse/IGNITE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514678#comment-17514678
 ] 

Sergey Chugunov edited comment on IGNITE-14197 at 3/30/22, 12:48 PM:
---------------------------------------------------------------------

[~akalashnikov], I believe this fix was incorporated as part of a bigger effort 
of refactoring and fixing some issues in our throttlers: IGNITE-16581, 
IGNITE-16600. I checked the code, main idea of the PR is incorporated in the 
refactored code.

[~dpavlov], does it make sense to you to close the ticket as well? BTW I added 
links to the related tickets.


was (Author: sergeychugunov):
[~akalashnikov], I believe this fix was incorporated as part of a bigger effort 
of refactoring and fixing some issues in our throttlers: IGNITE-16581, 
IGNITE-16582, IGNITE-16600. I checked the code, main idea of the PR is 
incorporated in the refactored code.

[~dpavlov], does it make sense to you to close the ticket as well? BTW I added 
links to the related tickets.

> Checkpoint thread can't take checkpoint write lock because it waits for 
> parked threads to complete their work
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-14197
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14197
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Anton Kalashnikov
>            Assignee: Anton Kalashnikov
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In case of enabled write throttling, when, for example, node parks data 
> streamer thread, it still holds checkpoint read lock and it leads to the long 
> pauses on waiting for checkpoint lock:
> [2020-07-23 07:09:21,614][INFO 
> ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint 
> started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a, 
> startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972], 
> checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*, 
> checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms, 
> walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms, 
> splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of 
> WAL without checkpoint']
> All operations at this moment are blocked.
> Sometimes, it can lead to a complete disaster:
> Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
> {quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0 
> tid=0x00007f6161d6a800 nid=0xf932 waiting on condition [0x00007f5c292d1000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
> at 
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
> at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
> at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770)
> at 
> org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278)
> at 
> org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139)
> at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104)
> at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> at java.lang.Thread.run(Thread.java:748)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to