[ https://issues.apache.org/jira/browse/IGNITE-17385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Shishkov updated IGNITE-17385: ----------------------------------- Description: When you commit a transaction, which was explicitly started only over a single cache, then {{GridCacheAdapter#asyncOpRelease}} is called without {{GridCacheAdapter#asyncOpAcquire}}. This situation can lead to continuous grow of permits count in {{GridCacheAdapter#asyncOpsSem}} and to overflow with a further failure of node started the transaction: {code} Critical system error detected. Will be handled accordingly to configured handler [hnd=o.a.i.i.processors.cache.transactions.TxAsyncOpsSemaphorePermitsExeededTest$$Lambda$42/1924582348@7379bebb, failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.Error: Maximum permit count exceeded]] {code} As you can see in [1], for the single cache context transaction is commited by calling of {{GridCacheAdapter#commitTxAsync}}, which invokes {{GridCacheAdapter#asyncOpRelease}} later. When multiple caches affected by transaction, {{GridNearTxLocal#commitNearTxLocalAsync}} is called to commit transaction, and no invokes of {{GridCacheAdapter#asyncOpRelease}} occurs. So, the greater the load (RPS / TPS) with a such single cache transactions, the faster the failure of node will occur. Reproducer of the problem: [^SemaphorePermitsExceeded.patch]. It prints additional messages, when semaphore is released, or acquired. Links: # https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheSharedContext.java#L1122 was: When you commit a transaction, which was explicitly started only over a single cache, then {{GridCacheAdapter#asyncOpRelease}} is called without {{GridCacheAdapter#asyncOpAcquire}}. This situation can lead to continuous grow of permits count in {{GridCacheAdapter#asyncOpsSem}} and to overflow with a further failure of node started the transaction: {code} Critical system error detected. Will be handled accordingly to configured handler [hnd=o.a.i.i.processors.cache.transactions.TxAsyncOpsSemaphorePermitsExeededTest$$Lambda$42/1924582348@7379bebb, failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.Error: Maximum permit count exceeded]] {code} As you can see in [1], for the single cache context transaction is commited by calling of {{GridCacheAdapter#commitTxAsync}}, which invokes {{GridCacheAdapter#asyncOpRelease}} later. When multiple caches affected by transaction, {{GridNearTxLocal#commitNearTxLocalAsync}} is called to commit transaction, and no invokes of {{GridCacheAdapter#asyncOpRelease}} occurs. So, the greater the load (RPS / TPS) with a such transactions, the faster the failure of node will occur. Reproducer of the problem: [^SemaphorePermitsExceeded.patch]. It prints additional messages, when semaphore is released, or acquired. Links: # https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheSharedContext.java#L1122 > Frequent commits of single cache transactions can lead > GridCacheAdapter#asyncOpsSem permits overflow > ---------------------------------------------------------------------------------------------------- > > Key: IGNITE-17385 > URL: https://issues.apache.org/jira/browse/IGNITE-17385 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.13 > Reporter: Ilya Shishkov > Priority: Major > Labels: ise > Attachments: SemaphorePermitsExceeded.patch > > > When you commit a transaction, which was explicitly started only over a > single cache, then {{GridCacheAdapter#asyncOpRelease}} is called without > {{GridCacheAdapter#asyncOpAcquire}}. This situation can lead to continuous > grow of permits count in {{GridCacheAdapter#asyncOpsSem}} and to overflow > with a further failure of node started the transaction: > {code} > Critical system error detected. Will be handled accordingly to configured > handler > [hnd=o.a.i.i.processors.cache.transactions.TxAsyncOpsSemaphorePermitsExeededTest$$Lambda$42/1924582348@7379bebb, > failureCtx=FailureContext [type=CRITICAL_ERROR, err=java.lang.Error: Maximum > permit count exceeded]] > {code} > As you can see in [1], for the single cache context transaction is commited > by calling of {{GridCacheAdapter#commitTxAsync}}, which invokes > {{GridCacheAdapter#asyncOpRelease}} later. When multiple caches affected by > transaction, {{GridNearTxLocal#commitNearTxLocalAsync}} is called to commit > transaction, and no invokes of {{GridCacheAdapter#asyncOpRelease}} occurs. > So, the greater the load (RPS / TPS) with a such single cache transactions, > the faster the failure of node will occur. > Reproducer of the problem: [^SemaphorePermitsExceeded.patch]. It prints > additional messages, when semaphore is released, or acquired. > Links: > # > https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/GridCacheSharedContext.java#L1122 -- This message was sent by Atlassian Jira (v8.20.10#820010)