Sergey Korotkov created IGNITE-23971:
----------------------------------------
Summary: Striped pool queue overflow in ScanQuery for
touched/accessed expiry TTL policy
Key: IGNITE-23971
URL: https://issues.apache.org/jira/browse/IGNITE-23971
Project: Ignite
Issue Type: Bug
Reporter: Sergey Korotkov
Assignee: Sergey Korotkov
ScanQuery on cache with the AccessedExpiryPolicy or TouchedExpiryPolicy TTL
overflows the striped pool queue with the GridCacheTtlUpdateRequest messages.
This causes pool strvation, heap memory overflow, Full GC collection, JVM
pauses and finally the cluster crash.
{noformat}
[2024-12-12T11:16:34,525][INFO
][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][dev]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=8e2174c0,
name=isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev,
uptime=00:01:31.893]
^-- Cluster [hosts=2, CPUs=48, servers=2, clients=0, topVer=6,
minorTopVer=0]
^-- Network [addrs=[172.20.0.4], localHost=172.20.0.4, discoPort=47500,
commPort=47100]
^-- CPU [CPUs=24, curLoad=97.73%, avgLoad=38.7%, GC=7.03%]
^-- Heap [used=4829MB, free=5.66%, comm=5120MB]
^-- Outbound messages queue [size=580]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=24, qSize=0]
^-- Query thread pool [active=0, idle=24, qSize=0]
^-- Striped thread pool [active=24, idle=0, qSize=15892]
{noformat}
{noformat}
[2024-12-12T11:17:09,620][WARN ][jvm-pause-detector-worker][dev] Possible too
long JVM pause: 1574 milliseconds.
[2024-12-12T11:17:09,625][ERROR][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [workerName=sys-stripe-16,
threadName=sys-stripe-16-#17%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%,
blockedFor=10s]
[2024-12-12T11:17:09,633][WARN
][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][]
Possible failure suppressed accordingly to a configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=sys-stripe-16,
igniteInstanceName=isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev,
finished=false, heartbeatTs=1733991419356]]]
org.apache.ignite.IgniteException: GridWorker [name=sys-stripe-16,
igniteInstanceName=isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev,
finished=false, heartbeatTs=1733991419356]
at java.base/jdk.internal.misc.Unsafe.unpark(Native Method) ~[?:?]
at
java.base/java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:160)
~[?:?]
at
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:709)
~[?:?]
at
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1305)
~[?:?]
at
java.base/java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:439)
~[?:?]
at
org.apache.ignite.internal.util.OffheapReadWriteLock.writeUnlock(OffheapReadWriteLock.java:304)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1735)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:539)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:531)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:416)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:323)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:325)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.updateDataRow(AbstractFreeList.java:788)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.RowStore.updateRow(RowStore.java:156)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1539)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:2512)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4380)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4317)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3942)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2262)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:2138)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1497)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1480)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2530)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:395)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3446)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3420)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtlUnlocked(GridCacheMapEntry.java:2230)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtlUnlocked(GridCacheMapEntry.java:2203)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtl(GridCacheMapEntry.java:3331)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.updateTtl(GridDhtCacheAdapter.java:1510)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processTtlUpdateRequest(GridDhtCacheAdapter.java:1467)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$$Lambda$1722/0x00000008409ff440.apply(Unknown
Source) ~[?:?]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1096)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:597)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:398)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:316)
~[classes/:?]
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:306)
~[classes/:?]
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1882)
~[classes/:?]
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1503)
~[classes/:?]
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1407)
~[classes/:?]
at
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
~[classes/:?]
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637)
~[classes/:?]
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
[classes/:?]
at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
[2024-12-12T11:17:09,663][WARN
][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][FailureProcessor]
No deadlocked threads detected.
[2024-12-12T11:17:12,100][WARN ][jvm-pause-detector-worker][dev] Possible too
long JVM pause: 2429 milliseconds.
{noformat}
{noformat}
[2024-12-12T11:17:31,237][ERROR][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][G]
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined b
ehaviour [workerName=sys-stripe-14,
threadName=sys-stripe-14-#15%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%,
blockedFor=16s]
[2024-12-12T11:17:31,239][WARN
][push-metrics-exporter-#123%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][PoolProcessor]
>>> Possible starvation in striped pool.
Thread name:
sys-stripe-14-#15%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%
Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE,
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false,
msg=GridCacheTtlUpdateRequest [keys=ArrayList [KeyCacheObjectImpl [part=1,
val=null, hasValBytes=true], KeyCacheObjectImpl [part=1, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=2, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=2, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=4, val=null, hasValBytes=true], KeyCacheObjectImpl [part=4, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=6, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=6, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=7, val=null, hasValBytes=true], KeyCacheObjectImpl [part=7, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=8, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=8, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=9, val=null, hasValBytes=true], KeyCacheObjectImpl [part=9, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=11, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=11, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=12, val=null, hasValBytes=true], KeyCacheObjectImpl [part=12, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=17, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=17, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=20, val=null, hasValBytes=true], KeyCacheObjectImpl [part=20, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=23, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=23, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=31, val=null, hasValBytes=true], KeyCacheObjectImpl [part=31, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=34, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=34, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=35, val=null, hasValBytes=true], KeyCacheObjectImpl [part=35, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=36, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=36, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=38, val=null, hasValBytes=true], KeyCacheObjectImpl [part=38, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=39, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=39, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=40, val=null, hasValBytes=true], KeyCacheObjectImpl [part=40, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=41, val=null, hasValBytes=true],
KeyCacheObjectImpl [part=41, val=null, hasValBytes=true], KeyCacheObjectImpl
[part=42, val=null, hasValBytes=true], KeyCacheObjectImpl [part=42, val=null,
hasValBytes=true], KeyCacheObjectImpl [part=45, val=null, hasValBytes=true],
...... and so on
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)