Anton Vinogradov created IGNITE-16374:
-----------------------------------------
Summary: Entry Processor may cause transactional cache
inconsistency
Key: IGNITE-16374
URL: https://issues.apache.org/jira/browse/IGNITE-16374
Project: Ignite
Issue Type: Bug
Reporter: Anton Vinogradov
At two-phase commit, primary invokes EP on prepare phase.
In case EP will cause an exception, the operation will be retried or failed
(when retry also caused an exception).
Typical stacktrace is:
{code}
at
org.apache.ignite.internal.processors.cache.EntryProcessorResourceInjectorProxy.process(EntryProcessorResourceInjectorProxy.java:68)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.onEntriesLocked(GridDhtTxPrepareFuture.java:460)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1325)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.mapIfLocked(GridDhtTxPrepareFuture.java:732)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare(GridDhtTxPrepareFuture.java:1138)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareAsyncLocal(GridNearTxLocal.java:4458)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.prepareColocatedTx(IgniteTxHandler.java:302)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.proceedPrepare(GridNearOptimisticTxPrepareFuture.java:565)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepareSingle(GridNearOptimisticTxPrepareFuture.java:392)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepare0(GridNearOptimisticTxPrepareFuture.java:335)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepareOnTopology(GridNearOptimisticTxPrepareFutureAdapter.java:205)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepare(GridNearOptimisticTxPrepareFutureAdapter.java:129)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareNearTxLocal(GridNearTxLocal.java:4122)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:4170)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.optimisticPutFuture(GridNearTxLocal.java:3066)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAsync0(GridNearTxLocal.java:731)
at
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.invokeAsync(GridNearTxLocal.java:508)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter$25.op(GridCacheAdapter.java:2695)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter$25.op(GridCacheAdapter.java:2682)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4381)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.invoke0(GridCacheAdapter.java:2682)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.invoke(GridCacheAdapter.java:2663)
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.invoke(IgniteCacheProxyImpl.java:1735)
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.invoke(GatewayProtectedCacheProxy.java:1254)
{code}
At the same time, EP processing on backup will happen in the finishing phase.
At this time, primary already committed EP execution result, while EP
processing on backup is not guaranteed to be successful.
So, Exceptions during EP execution on backup will cause inconsistency.
Typical stacktrace is:
{code}
at
org.apache.ignite.internal.processors.cache.EntryProcessorResourceInjectorProxy.process(EntryProcessorResourceInjectorProxy.java:68)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.applyTransformClosures(IgniteTxAdapter.java:1700)
at
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:574)
at
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:886)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1471)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1393)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:134)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:258)
at
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:256)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1151)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:592)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:110)
at
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:309)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
at
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
at
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:569)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
{code}
Possible solutions are
1) Retry on execution fail on backup without locks release until committed.
This should guarantee consistency when successful EP execution is possible
after some retries.
But, this may cause a neverending unsuccessful retry story and even cause
cluster failure.
2) EP execution on backups also should happen in the preparation phase.
In this case, we will have all the data prepared on the finishing phase and
have no chance to fail because of unsuccessful EP execution.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)