[
https://issues.apache.org/jira/browse/IGNITE-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514678#comment-17514678
]
Sergey Chugunov edited comment on IGNITE-14197 at 3/30/22, 12:48 PM:
---------------------------------------------------------------------
[~akalashnikov], I believe this fix was incorporated as part of a bigger effort
of refactoring and fixing some issues in our throttlers: IGNITE-16581,
IGNITE-16600. I checked the code, main idea of the PR is incorporated in the
refactored code.
[~dpavlov], does it make sense to you to close the ticket as well? BTW I added
links to the related tickets.
was (Author: sergeychugunov):
[~akalashnikov], I believe this fix was incorporated as part of a bigger effort
of refactoring and fixing some issues in our throttlers: IGNITE-16581,
IGNITE-16582, IGNITE-16600. I checked the code, main idea of the PR is
incorporated in the refactored code.
[~dpavlov], does it make sense to you to close the ticket as well? BTW I added
links to the related tickets.
> Checkpoint thread can't take checkpoint write lock because it waits for
> parked threads to complete their work
> -------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-14197
> URL: https://issues.apache.org/jira/browse/IGNITE-14197
> Project: Ignite
> Issue Type: Bug
> Reporter: Anton Kalashnikov
> Assignee: Anton Kalashnikov
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> In case of enabled write throttling, when, for example, node parks data
> streamer thread, it still holds checkpoint read lock and it leads to the long
> pauses on waiting for checkpoint lock:
> [2020-07-23 07:09:21,614][INFO
> ][db-checkpoint-thread-#371][GridCacheDatabaseSharedManager] Checkpoint
> started [checkpointId=f964c8f2-daa5-41b2-80ef-944326f26f8a,
> startPtr=FileWALPointer [idx=56913, fileOff=10362905, len=41972],
> checkpointBeforeLockTime=1983ms, *checkpointLockWait=812117ms*,
> checkpointListenersExecuteTime=90ms, checkpointLockHoldTime=93ms,
> walCpRecordFsyncDuration=123ms, writeCheckpointEntryDuration=4ms,
> splitAndSortCpPagesDuration=4155ms, pages=10516815, reason='too big size of
> WAL without checkpoint']
> All operations at this moment are blocked.
> Sometimes, it can lead to a complete disaster:
> Parking thread=data-streamer-stripe-47-#144 for timeout(ms)=*21278855*
> {quote}“data-streamer-stripe-78-#175” #209 prio=5 os_prio=0
> tid=0x00007f6161d6a800 nid=0xf932 waiting on condition [0x00007f5c292d1000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
> at
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.doPark(PagesWriteSpeedBasedThrottle.java:244)
> at
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PagesWriteSpeedBasedThrottle.onMarkDirty(PagesWriteSpeedBasedThrottle.java:227)
> at
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1730)
> at
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:491)
> at
> org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:483)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:394)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:369)
> at
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:296)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11300(BPlusTree.java:98)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.tryInsert(BPlusTree.java:3864)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.access$7100(BPlusTree.java:3544)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.onNotFound(BPlusTree.java:4103)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5800(BPlusTree.java:3894)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2022)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:1997)
> at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1904)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1662)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1645)
> at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2473)
> at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:436)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4306)
> at
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3441)
> at
> org.apache.ignite.internal.processors.cache.GridCacheEntryEx.initialValue(GridCacheEntryEx.java:770)
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$IsolatedUpdater.receive(DataStreamerImpl.java:2278)
> at
> org.apache.ignite.internal.processors.datastreamer.DataStreamerUpdateJob.call(DataStreamerUpdateJob.java:139)
> at
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:7104)
> at
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:966)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> at
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:559)
> at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
> at java.lang.Thread.run(Thread.java:748)
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)