Sergey Korotkov created IGNITE-23971:
----------------------------------------

             Summary: Striped pool queue overflow in ScanQuery for 
touched/accessed expiry TTL policy
                 Key: IGNITE-23971
                 URL: https://issues.apache.org/jira/browse/IGNITE-23971
             Project: Ignite
          Issue Type: Bug
            Reporter: Sergey Korotkov
            Assignee: Sergey Korotkov


ScanQuery on cache with the AccessedExpiryPolicy or TouchedExpiryPolicy TTL 
overflows the striped pool queue with the GridCacheTtlUpdateRequest messages.

This causes pool strvation, heap memory overflow, Full GC collection, JVM 
pauses and finally the cluster crash.


{noformat}
[2024-12-12T11:16:34,525][INFO 
][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][dev]
 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=8e2174c0, 
name=isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev,
 uptime=00:01:31.893]
    ^-- Cluster [hosts=2, CPUs=48, servers=2, clients=0, topVer=6, 
minorTopVer=0]
    ^-- Network [addrs=[172.20.0.4], localHost=172.20.0.4, discoPort=47500, 
commPort=47100]
    ^-- CPU [CPUs=24, curLoad=97.73%, avgLoad=38.7%, GC=7.03%]
    ^-- Heap [used=4829MB, free=5.66%, comm=5120MB]
    ^-- Outbound messages queue [size=580]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=24, qSize=0]
    ^-- Query thread pool [active=0, idle=24, qSize=0]
    ^-- Striped thread pool [active=24, idle=0, qSize=15892]
{noformat}


{noformat}
[2024-12-12T11:17:09,620][WARN ][jvm-pause-detector-worker][dev] Possible too 
long JVM pause: 1574 milliseconds.
[2024-12-12T11:17:09,625][ERROR][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][G]
 Blocked system-critical thread has been detected. This can lead to 
cluster-wide undefined behaviour [workerName=sys-stripe-16, 
threadName=sys-stripe-16-#17%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%,
 blockedFor=10s]
[2024-12-12T11:17:09,633][WARN 
][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][]
 Possible failure suppressed accordingly to a configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class 
o.a.i.IgniteException: GridWorker [name=sys-stripe-16, 
igniteInstanceName=isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev,
 finished=false, heartbeatTs=1733991419356]]]
org.apache.ignite.IgniteException: GridWorker [name=sys-stripe-16, 
igniteInstanceName=isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev,
 finished=false, heartbeatTs=1733991419356]
        at java.base/jdk.internal.misc.Unsafe.unpark(Native Method) ~[?:?]
        at 
java.base/java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:160) 
~[?:?]
        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:709)
 ~[?:?]
        at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1305)
 ~[?:?]
        at 
java.base/java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:439)
 ~[?:?]
        at 
org.apache.ignite.internal.util.OffheapReadWriteLock.writeUnlock(OffheapReadWriteLock.java:304)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlockPage(PageMemoryImpl.java:1735)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:539)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.writeUnlock(PageMemoryImpl.java:531)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writeUnlock(PageHandler.java:416)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.writePage(PageHandler.java:323)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.write(DataStructure.java:325)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.updateDataRow(AbstractFreeList.java:788)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.RowStore.updateRow(RowStore.java:156)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.createRow(IgniteCacheOffheapManagerImpl.java:1539)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.createRow(GridCacheOffheapManager.java:2512)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4380)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$UpdateClosure.call(GridCacheMapEntry.java:4317)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(BPlusTree.java:3942)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(BPlusTree.java:2262)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:2138)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1497)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1480)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2530)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:395)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3446)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3420)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtlUnlocked(GridCacheMapEntry.java:2230)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtlUnlocked(GridCacheMapEntry.java:2203)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtl(GridCacheMapEntry.java:3331)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.updateTtl(GridDhtCacheAdapter.java:1510)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processTtlUpdateRequest(GridDhtCacheAdapter.java:1467)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter$$Lambda$1722/0x00000008409ff440.apply(Unknown
 Source) ~[?:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1096)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:597)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:398)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:316)
 ~[classes/:?]
        at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:306)
 ~[classes/:?]
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1882)
 ~[classes/:?]
        at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1503)
 ~[classes/:?]
        at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1407)
 ~[classes/:?]
        at 
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
 ~[classes/:?]
        at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:637)
 ~[classes/:?]
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) 
[classes/:?]
        at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
[2024-12-12T11:17:09,663][WARN 
][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][FailureProcessor]
 No deadlocked threads detected.
[2024-12-12T11:17:12,100][WARN ][jvm-pause-detector-worker][dev] Possible too 
long JVM pause: 2429 milliseconds.
{noformat}

{noformat}
[2024-12-12T11:17:31,237][ERROR][grid-timeout-worker-#54%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][G]
 Blocked system-critical thread has been detected. This can lead to 
cluster-wide undefined b
ehaviour [workerName=sys-stripe-14, 
threadName=sys-stripe-14-#15%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%,
 blockedFor=16s]
[2024-12-12T11:17:31,239][WARN 
][push-metrics-exporter-#123%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%][PoolProcessor]
 >>> Possible starvation in striped pool.
    Thread name: 
sys-stripe-14-#15%isetest.tests.ttl.touched_policy_test.TouchedPolicyTest.scan_query_test.ignite_version.dev%
    Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, 
topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, 
msg=GridCacheTtlUpdateRequest [keys=ArrayList [KeyCacheObjectImpl [part=1, 
val=null, hasValBytes=true], KeyCacheObjectImpl [part=1, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=2, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=2, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=4, val=null, hasValBytes=true], KeyCacheObjectImpl [part=4, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=6, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=6, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=7, val=null, hasValBytes=true], KeyCacheObjectImpl [part=7, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=8, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=8, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=9, val=null, hasValBytes=true], KeyCacheObjectImpl [part=9, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=11, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=11, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=12, val=null, hasValBytes=true], KeyCacheObjectImpl [part=12, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=17, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=17, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=20, val=null, hasValBytes=true], KeyCacheObjectImpl [part=20, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=23, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=23, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=31, val=null, hasValBytes=true], KeyCacheObjectImpl [part=31, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=34, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=34, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=35, val=null, hasValBytes=true], KeyCacheObjectImpl [part=35, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=36, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=36, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=38, val=null, hasValBytes=true], KeyCacheObjectImpl [part=38, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=39, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=39, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=40, val=null, hasValBytes=true], KeyCacheObjectImpl [part=40, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=41, val=null, hasValBytes=true], 
KeyCacheObjectImpl [part=41, val=null, hasValBytes=true], KeyCacheObjectImpl 
[part=42, val=null, hasValBytes=true], KeyCacheObjectImpl [part=42, val=null, 
hasValBytes=true], KeyCacheObjectImpl [part=45, val=null, hasValBytes=true], 

......   and so on
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to