[ 
https://issues.apache.org/jira/browse/IGNITE-27638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053482#comment-18053482
 ] 

Ivan Bessonov edited comment on IGNITE-27638 at 1/22/26 9:01 AM:
-----------------------------------------------------------------

The issue is that {{onUpdate}} can be called twice for invoke closures if we're 
in the "replace" operation, which is the case for abort if we're aborting 
"update" instead of "insert". So, imagine we have the following tree:
{code:java}
  A
 / \
A   B {code}
Other values are irrelevant. The WI list happens to look like this:
{code:java}
x <-> A <-> B <-> y{code}
Following data race corrupts the WI list:
 * "tail" for A is locked, node for B is locked too. They are mutually 
exclusive and it is possible to hold these locks independently. If this 
particular picture is not convincing, please add new root and put A and B into 
different independent sub-trees.
 * {{onUpdate}} is called first time for A. The picture is the following:
{{A.prev = x}}
{{A.next = B}}
{{x -- B -- y}}
 * B is aborted, its link is invalidated. The picture is the following:
{{A.prev = x}}
{{A.next = B}}
{{x -- y}}
 * Now {{onUpdate}} is called a second time for A. The picture is the following:
{{x.next = B}}
{{B.prev = x}}
{{y.prev = x}}
 * We have a pointer to nowhere. The {{B.prev}} could be a corruption of other 
node {{C}} which happened to be the resolution of link B at the moment, and 
that could be invalidated soon.

It is possible to corrupt "prev" link using a symmetric WI list.


was (Author: ibessonov):
The issue is that {{onUpdate}} can be called twice for invoke closures if we're 
in the "replace" operation, which is the case for abort if we're aborting 
"update" instead of "insert". So, imagine we have the following tree:
{code:java}
  A
 / \
A   B {code}
Other values are irrelevant. The WI list happens to look like this:
{code:java}
x <-> A <-> B <-> y{code}
Following data race corrupts the WI list:
 * "tail" for A is locked, node for B is locked too. They are mutually 
exclusive and it is possible to hold these locks independently. If this 
particular picture is not convincing, please add new root and put A and B into 
different independent sub-trees.
 * {{onUpdate}} is called first time for A. The picture is the following:
{{A.prev = x}}
{{A.next = B}}
{{x -- B -- y}}
 * B is aborted, its link is invalidated. The picture is the following:
{{A.prev = x}}
{{A.next = B}}
{{x -- y}}
 * Now {{onUpdate}} is called a second time for A. The picture is the following:
{{x.next = B}}
{{B.prev = a}}
{{y.prev = x}}
 * We have a pointer to nowhere. The {{B.prev}} could be a corruption of other 
node {{C}} which happened to be the resolution of link B at the moment, and 
that could be invalidated soom

It is possible to corrupt "prev" link using a symmetric WI list.

> Exception in AbortWriteInvokeClosure#onUpdate
> ---------------------------------------------
>
>                 Key: IGNITE-27638
>                 URL: https://issues.apache.org/jira/browse/IGNITE-27638
>             Project: Ignite
>          Issue Type: Bug
>          Components: storage engines ai3
>            Reporter: Ivan Bessonov
>            Assignee: Ivan Bessonov
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Deletion from WI list is performed twice sometimes. Considering this comment:
> {code:java}
> // We don't zero out removed write intent's WI links because we already 
> unlinked it everywhere except for WI list itself,
> // so no one can read its WI links, and we are going to remove it from WI 
> list under the WI list lock. {code}
> the data consistency is violated and we get an error:
> {code:java}
> Caused by: 
> org.apache.ignite.internal.pagememory.freelist.CorruptedFreeListException: 
> IGN-STORAGE-2 Failed to update data row TraceId:4cc5584d
> at 
> org.apache.ignite.internal.pagememory.freelist.FreeListImpl.updateDataRow(FreeListImpl.java:674)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.WriteIntentListSupport.removeNodeFromWriteIntentsList(WriteIntentListSupport.java:42)
> ... 64 more
> Caused by: java.lang.IllegalStateException: Item not found: 17
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.findIndirectItemIndex(DataPageIo.java:453)
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.resolveDirectItemIdFromIndirectItemId(DataPageIo.java:605)
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.getDataOffset(DataPageIo.java:596)
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.getPayloadOffset(DataPageIo.java:720)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.UpdateNextWiLinkHandler.run(UpdateNextWiLinkHandler.java:34)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.UpdateNextWiLinkHandler.run(UpdateNextWiLinkHandler.java:19)
> at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:220)
> at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:271)
> at 
> org.apache.ignite.internal.pagememory.freelist.FreeListImpl.updateDataRow(FreeListImpl.java:664)
> ... 65 more
> 2026-01-21 15:07:34:526 +0000 
> [WARNING][%cac-dpd-cde-gg-aks-dev-3%partition-operations-4][TxCleanupRequestSender]
>  First cleanup attempt failed (the transaction outcome is not affected) 
> [txId=019be118-132c-00ab-399b-327000000001]
> java.util.concurrent.CompletionException: 
> org.apache.ignite.internal.storage.StorageException: IGN-CMN-65535 Error 
> while updating WI links: [link=1081145415610400959, rowId=RowId 
> [partitionId=7, uuid=0000019b-e109-68e4-04dc-cf202c264365], 
> txId=019be118-132c-00ab-399b-327000000001, tableId=1171, partitionId=7] 
> TraceId:65a31616
> at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source)
> at java.base/java.util.concurrent.CompletableFuture.uniApplyNow(Unknown 
> Source)
> at java.base/java.util.concurrent.CompletableFuture.uniApplyStage(Unknown 
> Source)
> at java.base/java.util.concurrent.CompletableFuture.thenApply(Unknown Source)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processTableWriteIntentSwitchAction(PartitionReplicaListener.java:1986)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processOperationRequest(PartitionReplicaListener.java:660)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processOperationRequestWithTxOperationManagementLogic(PartitionReplicaListener.java:4235)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processRequest(PartitionReplicaListener.java:520)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processRequestInContext(PartitionReplicaListener.java:480)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.process(PartitionReplicaListener.java:472)
> at 
> org.apache.ignite.internal.partition.replicator.handlers.WriteIntentSwitchRequestHandler.lambda$invokeTableWriteIntentSwitchReplicaRequest$4(WriteIntentSwitchRequestHandler.java:179)
> at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(Unknown 
> Source)
> at java.base/java.util.concurrent.CompletableFuture.thenCompose(Unknown 
> Source)
> at 
> org.apache.ignite.internal.partition.replicator.handlers.WriteIntentSwitchRequestHandler.invokeTableWriteIntentSwitchReplicaRequest(WriteIntentSwitchRequestHandler.java:165)
> at 
> org.apache.ignite.internal.partition.replicator.handlers.WriteIntentSwitchRequestHandler.lambda$handle$0(WriteIntentSwitchRequestHandler.java:118)
> at java.base/java.util.stream.ReferencePipeline$3$1.accept(Unknown Source)
> at java.base/java.util.Iterator.forEachRemaining(Unknown Source)
> at 
> java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Unknown 
> Source)
> at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
> at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
> at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown 
> Source)
> at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
> at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
> at 
> org.apache.ignite.internal.partition.replicator.handlers.WriteIntentSwitchRequestHandler.handle(WriteIntentSwitchRequestHandler.java:119)
> at 
> org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.processRequest(ZonePartitionReplicaListener.java:232)
> at 
> org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.lambda$invoke$0(ZonePartitionReplicaListener.java:208)
> at java.base/java.util.concurrent.CompletableFuture.uniComposeStage(Unknown 
> Source)
> at java.base/java.util.concurrent.CompletableFuture.thenCompose(Unknown 
> Source)
> at 
> org.apache.ignite.internal.partition.replicator.ZonePartitionReplicaListener.invoke(ZonePartitionReplicaListener.java:208)
> at 
> org.apache.ignite.internal.replicator.ZonePartitionReplicaImpl.processRequest(ZonePartitionReplicaImpl.java:67)
> at 
> org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:382)
> at 
> org.apache.ignite.internal.replicator.ReplicaManager.lambda$onReplicaMessageReceived$0(ReplicaManager.java:313)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.ignite.internal.storage.StorageException: IGN-CMN-65535 
> Error while updating WI links: [link=1081145415610400959, rowId=RowId 
> [partitionId=7, uuid=0000019b-e109-68e4-04dc-cf202c264365], 
> txId=019be118-132c-00ab-399b-327000000001, tableId=1171, partitionId=7] 
> TraceId:65a31616
> at 
> org.apache.ignite.internal.storage.pagememory.mv.WriteIntentListSupport.removeNodeFromWriteIntentsList(WriteIntentListSupport.java:55)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.WiLinkableRowVersionOperations.removeFromWriteIntentsList(WiLinkableRowVersionOperations.java:42)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.AbortWriteInvokeClosure.onUpdate(AbortWriteInvokeClosure.java:110)
> at 
> org.apache.ignite.internal.pagememory.tree.BplusTree$Put.replaceRowInPage(BplusTree.java:4246)
> at 
> org.apache.ignite.internal.pagememory.tree.BplusTree$Put.finishTail(BplusTree.java:4020)
> at 
> org.apache.ignite.internal.pagememory.tree.BplusTree$Invoke.tryFinish(BplusTree.java:4501)
> at 
> org.apache.ignite.internal.pagememory.tree.BplusTree.invoke(BplusTree.java:2204)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.lambda$abortWrite$19(AbstractPageMemoryMvPartitionStorage.java:672)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:1086)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.abortWrite(AbstractPageMemoryMvPartitionStorage.java:664)
> at 
> org.apache.ignite.internal.table.distributed.raft.snapshot.SnapshotAwarePartitionDataStorage.abortWrite(SnapshotAwarePartitionDataStorage.java:149)
> at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.performAbortWrite(StorageUpdateHandler.java:478)
> at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.lambda$switchWriteIntents$2(StorageUpdateHandler.java:422)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.lambda$runConsistently$1(PersistentPageMemoryMvPartitionStorage.java:207)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:1086)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.runConsistently(PersistentPageMemoryMvPartitionStorage.java:197)
> at 
> org.apache.ignite.internal.table.distributed.raft.snapshot.SnapshotAwarePartitionDataStorage.runConsistently(SnapshotAwarePartitionDataStorage.java:83)
> at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.switchWriteIntents(StorageUpdateHandler.java:409)
> at 
> org.apache.ignite.internal.table.distributed.StorageUpdateHandler.switchWriteIntents(StorageUpdateHandler.java:376)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.applyWriteIntentSwitchCommandLocally(PartitionReplicaListener.java:2042)
> at 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processTableWriteIntentSwitchAction$76(PartitionReplicaListener.java:1988)
> ... 34 more
> Caused by: 
> org.apache.ignite.internal.pagememory.freelist.CorruptedFreeListException: 
> IGN-STORAGE-2 Failed to update data row TraceId:65a31616
> at 
> org.apache.ignite.internal.pagememory.freelist.FreeListImpl.updateDataRow(FreeListImpl.java:674)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.WriteIntentListSupport.removeNodeFromWriteIntentsList(WriteIntentListSupport.java:42)
> ... 54 more
> Caused by: java.lang.IllegalStateException: Item not found: 17
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.findIndirectItemIndex(DataPageIo.java:453)
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.resolveDirectItemIdFromIndirectItemId(DataPageIo.java:605)
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.getDataOffset(DataPageIo.java:596)
> at 
> org.apache.ignite.internal.pagememory.io.DataPageIo.getPayloadOffset(DataPageIo.java:720)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.UpdateNextWiLinkHandler.run(UpdateNextWiLinkHandler.java:34)
> at 
> org.apache.ignite.internal.storage.pagememory.mv.UpdateNextWiLinkHandler.run(UpdateNextWiLinkHandler.java:19)
> at 
> org.apache.ignite.internal.pagememory.util.PageHandler.writePage(PageHandler.java:220)
> at 
> org.apache.ignite.internal.pagememory.datastructure.DataStructure.write(DataStructure.java:271)
> at 
> org.apache.ignite.internal.pagememory.freelist.FreeListImpl.updateDataRow(FreeListImpl.java:664)
> ... 55 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to