[
https://issues.apache.org/jira/browse/IGNITE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Evgeny Stanilovsky updated IGNITE-18326:
----------------------------------------
Description:
Scenario
* Start grid of [CGM, MetaStorage, DataNode] nodes.
* Stop DataNode.
* Run sql query, and wait on future for timeout.
* Observe: Query can't be started due to DataNode with the partition is absent,
and
Future throws CancelledException.
There is no way to get cursor closed because of future failure. Implicit
transaction object can't be accessed.
* Start DataNode back.
* Run the same query again
* Observe: Query failed because it can't lock the entry due to previous Tx
wasn't committed or rolled back.
Most likely, noone read from the cursor or we forget to close it when session
was closed.
---- *UPDATED* ---
After some investigations i found that tx commited and rolled back correctly,
the only problem i can found for now is mentioned above "it can't lock the
entry due to previous Tx". Check [1], test called : *testImplicitTransaction0*
it makes all described above, by Andrey, sometimes it passed but frequently we
can obtain :
{noformat}
2023-01-09 14:41:53:674 +0300
[WARNING][ForkJoinPool.commonPool-worker-11][ReplicaManager] Failed to process
replica request [request=ReadWriteMultiRowReplicaRequestImpl
[binaryRows=ArrayList [org.apache.ignite.internal.schema.row.Row@57114800],
commitPartitionId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0,
groupId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0, requestType=RW_INSERT_ALL,
term=3, timestamp=HybridTimestamp [physical=1673264513670, logical=0],
transactionId=000edb17-d281-0000-8a18-8deb88e18dfa]]
java.util.concurrent.CompletionException:
org.apache.ignite.internal.tx.LockException: IGN-TX-5
TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a
conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl
[txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null,
lockMode=X, locked=true, ex=null, isDone=true]]
Suppressed: java.lang.RuntimeException: This is a trimmed root
at
org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:747)
at
org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:767)
at
org.apache.ignite.internal.sql.engine.util.CursorUtils.getAllFromCursor(CursorUtils.java:70)
at
org.apache.ignite.internal.cluster.AbstractClusterStartStopTest.sql(AbstractClusterStartStopTest.java:269)
Caused by: org.apache.ignite.internal.tx.LockException: IGN-TX-5
TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to a
conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl
[txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false, prevLockMode=null,
lockMode=X, locked=true, ex=null, isDone=true]]
at
app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.isWaiterReadyToNotify(HeapLockManager.java:240)
at
app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.tryAcquire(HeapLockManager.java:197)
at
app//org.apache.ignite.internal.tx.impl.HeapLockManager.acquire(HeapLockManager.java:76)
at
app//org.apache.ignite.internal.table.distributed.HashIndexLocker.locksForLookup(HashIndexLocker.java:68)
at
app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.resolveRowByPk(PartitionReplicaListener.java:1035)
at
app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processMultiEntryAction(PartitionReplicaListener.java:1228)
at
app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$0(PartitionReplicaListener.java:255)
{noformat}
[1] https://github.com/gridgain/apache-ignite-3/tree/ignite-18171-new-test
was:
Scenario
* Start grid of [CGM, MetaStorage, DataNode] nodes.
* Stop DataNode.
* Run sql query, and wait on future for timeout.
* Observe: Query can't be started due to DataNode with the partition is absent,
and
Future throws CancelledException.
There is no way to get cursor closed because of future failure. Implicit
transaction object can't be accessed.
* Start DataNode back.
* Run the same query again
* Observe: Query failed because it can't lock the entry due to previous Tx
wasn't committed or rolled back.
Most likely, noone read from the cursor or we forget to close it when session
was closed.
Find reproducer in IGNITE-18171 PR in ignite-runner module
org.apache.ignite.internal.cluster.ItNodeRestartTest#testImplicitTransaction
> SQL query may forget to finish implicit TX.
> -------------------------------------------
>
> Key: IGNITE-18326
> URL: https://issues.apache.org/jira/browse/IGNITE-18326
> Project: Ignite
> Issue Type: Bug
> Reporter: Andrey Mashenkov
> Assignee: Evgeny Stanilovsky
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> Scenario
> * Start grid of [CGM, MetaStorage, DataNode] nodes.
> * Stop DataNode.
> * Run sql query, and wait on future for timeout.
> * Observe: Query can't be started due to DataNode with the partition is
> absent, and
> Future throws CancelledException.
> There is no way to get cursor closed because of future failure. Implicit
> transaction object can't be accessed.
> * Start DataNode back.
> * Run the same query again
> * Observe: Query failed because it can't lock the entry due to previous Tx
> wasn't committed or rolled back.
> Most likely, noone read from the cursor or we forget to close it when session
> was closed.
> ---- *UPDATED* ---
> After some investigations i found that tx commited and rolled back correctly,
> the only problem i can found for now is mentioned above "it can't lock the
> entry due to previous Tx". Check [1], test called :
> *testImplicitTransaction0* it makes all described above, by Andrey, sometimes
> it passed but frequently we can obtain :
> {noformat}
> 2023-01-09 14:41:53:674 +0300
> [WARNING][ForkJoinPool.commonPool-worker-11][ReplicaManager] Failed to
> process replica request [request=ReadWriteMultiRowReplicaRequestImpl
> [binaryRows=ArrayList [org.apache.ignite.internal.schema.row.Row@57114800],
> commitPartitionId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0,
> groupId=6c2142ce-3faa-4bc4-8ce7-7a5333bd92b9_part_0,
> requestType=RW_INSERT_ALL, term=3, timestamp=HybridTimestamp
> [physical=1673264513670, logical=0],
> transactionId=000edb17-d281-0000-8a18-8deb88e18dfa]]
> java.util.concurrent.CompletionException:
> org.apache.ignite.internal.tx.LockException: IGN-TX-5
> TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to
> a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl
> [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false,
> prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
> Suppressed: java.lang.RuntimeException: This is a trimmed root
> at
> org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:747)
> at
> org.apache.ignite.internal.testframework.IgniteTestUtils.await(IgniteTestUtils.java:767)
> at
> org.apache.ignite.internal.sql.engine.util.CursorUtils.getAllFromCursor(CursorUtils.java:70)
> at
> org.apache.ignite.internal.cluster.AbstractClusterStartStopTest.sql(AbstractClusterStartStopTest.java:269)
> Caused by: org.apache.ignite.internal.tx.LockException: IGN-TX-5
> TraceId:aa3bc7b7-f098-40eb-b1e1-a902e13933e0 Failed to acquire a lock due to
> a conflict [txId=000edb17-d281-0000-8a18-8deb88e18dfa, waiter=WaiterImpl
> [txId=000edb17-bb72-0000-8a18-8deb88e18dfa, upgraded=false,
> prevLockMode=null, lockMode=X, locked=true, ex=null, isDone=true]]
> at
> app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.isWaiterReadyToNotify(HeapLockManager.java:240)
> at
> app//org.apache.ignite.internal.tx.impl.HeapLockManager$LockState.tryAcquire(HeapLockManager.java:197)
> at
> app//org.apache.ignite.internal.tx.impl.HeapLockManager.acquire(HeapLockManager.java:76)
> at
> app//org.apache.ignite.internal.table.distributed.HashIndexLocker.locksForLookup(HashIndexLocker.java:68)
> at
> app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.resolveRowByPk(PartitionReplicaListener.java:1035)
> at
> app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processMultiEntryAction(PartitionReplicaListener.java:1228)
> at
> app//org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$0(PartitionReplicaListener.java:255)
> {noformat}
> [1] https://github.com/gridgain/apache-ignite-3/tree/ignite-18171-new-test
--
This message was sent by Atlassian Jira
(v8.20.10#820010)