[ 
https://issues.apache.org/jira/browse/IGNITE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Evgeny Stanilovsky updated IGNITE-18326:
----------------------------------------
    Description: 
Scenario
* Start grid of [CGM, MetaStorage, DataNode] nodes.
* Stop DataNode.
* Run sql query, and wait on future for timeout.
* Observe: Query can't be started due to DataNode with the partition is absent, 
and
Future throws CancelledException.
There is no way to get cursor closed because of future failure. Implicit 
transaction object can't be accessed.
* Start DataNode back.
* Run the same query again
* Observe: Query failed because it can't lock the entry due to previous Tx 
wasn't committed or rolled back.

Most likely, noone read from the cursor or we forget to close it when session 
was closed.

---- *UPDATED* ---

After some investigations i found that tx commited and rolled back correctly, 
the only problem i can found for now is mentioned above "it can't lock the 
entry due to previous Tx". Check [1], test called : *testImplicitTransaction0* 
it makes all described above, by Andrey, sometimes it passed but frequently we 
can obtain :


{noformat}
2023-01-09 14:41:53:674 +0300 
[WARNING][ForkJoinPool.commonPool-worker-11][ReplicaManager] Failed to process 
replica request [request=ReadWriteMultiRowReplicaRequestImpl 
[binaryRows=ArrayList [org.apache.ignite.internal.schema.row.Row@57114800], 
commitPartitionId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, 
groupId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, requestType=RW_INSERT_ALL, 
term=3, timestamp=HybridTimestamp [physical=1673264513670, logical=0], 
transactionId=000edb17-d281-0000-8a18-8deb88e18dfa]]
java.util.concurrent.CompletionException: 
org.apache.ignite.internal.tx.LockException: IGN-TX-5 
TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a 
conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl 
[txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null, 
lockMode=X, locked=true, ex=null, isDone=true]]
        Suppressed: java.lang.RuntimeException: This is a trimmed root
                at 
org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:747)
                at 
org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:767)
                at 
org.apache.ignite.internal.sql.engine.util.CursorUtils.getAllFromCursor(CursorUtils.java:70)
                at 
org.apache.ignite.internal.cluster.AbstractClusterStartStopTest.sql(AbstractClusterStartStopTest.java:269)
Caused by: org.apache.ignite.internal.tx.LockException: IGN-TX-5 
TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a 
conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl 
[txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null, 
lockMode=X, locked=true, ex=null, isDone=true]]
        at 
app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.isWaiterReadyToNotify(HeapLockManager.java:240)
        at 
app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.tryAcquire(HeapLockManager.java:197)
        at 
app//org.apache.ignite.internal.tx.impl.HeapLockManager.acquire(HeapLockManager.java:76)
        at 
app//org.apache.ignite.internal.table.distributed.HashIndexLocker.locksForLookup(HashIndexLocker.java:68)
        at 
app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.resolveRowByPk(PartitionReplicaListener.java:1035)
        at 
app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processMultiEntryAction(PartitionReplicaListener.java:1228)
        at 
app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$0(PartitionReplicaListener.java:255)
{noformat}


[1] https://github.com/gridgain/apache-ignite-3/tree/ignite-18171-new-test

  was:
Scenario
* Start grid of [CGM, MetaStorage, DataNode] nodes.
* Stop DataNode.
* Run sql query, and wait on future for timeout.
* Observe: Query can't be started due to DataNode with the partition is absent, 
and
Future throws CancelledException.
There is no way to get cursor closed because of future failure. Implicit 
transaction object can't be accessed.
* Start DataNode back.
* Run the same query again
* Observe: Query failed because it can't lock the entry due to previous Tx 
wasn't committed or rolled back.

Most likely, noone read from the cursor or we forget to close it when session 
was closed.
Find reproducer in IGNITE-18171 PR in ignite-runner module  
org.apache.ignite.internal.cluster.ItNodeRestartTest#testImplicitTransaction


> SQL query may forget to finish implicit TX.
> -------------------------------------------
>
>                 Key: IGNITE-18326
>                 URL: https://issues.apache.org/jira/browse/IGNITE-18326
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Andrey Mashenkov
>            Assignee: Evgeny Stanilovsky
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0.0-beta2
>
>
> Scenario
> * Start grid of [CGM, MetaStorage, DataNode] nodes.
> * Stop DataNode.
> * Run sql query, and wait on future for timeout.
> * Observe: Query can't be started due to DataNode with the partition is 
> absent, and
> Future throws CancelledException.
> There is no way to get cursor closed because of future failure. Implicit 
> transaction object can't be accessed.
> * Start DataNode back.
> * Run the same query again
> * Observe: Query failed because it can't lock the entry due to previous Tx 
> wasn't committed or rolled back.
> Most likely, noone read from the cursor or we forget to close it when session 
> was closed.
> ---- *UPDATED* ---
> After some investigations i found that tx commited and rolled back correctly, 
> the only problem i can found for now is mentioned above "it can't lock the 
> entry due to previous Tx". Check [1], test called : 
> *testImplicitTransaction0* it makes all described above, by Andrey, sometimes 
> it passed but frequently we can obtain :
> {noformat}
> 2023-01-09 14:41:53:674 +0300 
> [WARNING][ForkJoinPool.commonPool-worker-11][ReplicaManager] Failed to 
> process replica request [request=ReadWriteMultiRowReplicaRequestImpl 
> [binaryRows=ArrayList [org.apache.ignite.internal.schema.row.Row@57114800], 
> commitPartitionId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, 
> groupId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, 
> requestType=RW_INSERT_ALL, term=3, timestamp=HybridTimestamp 
> [physical=1673264513670, logical=0], 
> transactionId=000edb17-d281-0000-8a18-8deb88e18dfa]]
> java.util.concurrent.CompletionException: 
> org.apache.ignite.internal.tx.LockException: IGN-TX-5 
> TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to 
> a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl 
> [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, 
> prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
>       Suppressed: java.lang.RuntimeException: This is a trimmed root
>               at 
> org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:747)
>               at 
> org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:767)
>               at 
> org.apache.ignite.internal.sql.engine.util.CursorUtils.getAllFromCursor(CursorUtils.java:70)
>               at 
> org.apache.ignite.internal.cluster.AbstractClusterStartStopTest.sql(AbstractClusterStartStopTest.java:269)
> Caused by: org.apache.ignite.internal.tx.LockException: IGN-TX-5 
> TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to 
> a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl 
> [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, 
> prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
>       at 
> app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.isWaiterReadyToNotify(HeapLockManager.java:240)
>       at 
> app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.tryAcquire(HeapLockManager.java:197)
>       at 
> app//org.apache.ignite.internal.tx.impl.HeapLockManager.acquire(HeapLockManager.java:76)
>       at 
> app//org.apache.ignite.internal.table.distributed.HashIndexLocker.locksForLookup(HashIndexLocker.java:68)
>       at 
> app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.resolveRowByPk(PartitionReplicaListener.java:1035)
>       at 
> app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processMultiEntryAction(PartitionReplicaListener.java:1228)
>       at 
> app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$0(PartitionReplicaListener.java:255)
> {noformat}
> [1] https://github.com/gridgain/apache-ignite-3/tree/ignite-18171-new-test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to