[
https://issues.apache.org/jira/browse/IGNITE-23926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904363#comment-17904363
]
Kirill Tkalenko edited comment on IGNITE-23926 at 12/10/24 6:54 AM:
--------------------------------------------------------------------
The error occurs because the test does not have time to finish within the
timeout and starts to fail.
Which results in the execution of sql *DROP TABLE Person*, transactions that
were created after this should no longer see the table and its indexes
accordingly, but they just saw a new catalogVersion in which the table no
longer existed.
Here are the log messages that led to this conclusion:
{noformat}
[05:21:18] : [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,676][INFO
][%ijbst_n_0%JRaft-FSMCaller-Disruptor_stripe_4-0][TableUtils] >>>>>
TableUtils#indexIdsAtRwTxBeginTs: [txId=0193af03-bb5f-0020-de3d-986200000001,
txBTs=HybridTimestamp [physical=2024-12-10 05:21:17:663 +0000, logical=32,
composite=113626846177722400], tableId=16, catalogVersion=7, indexSize=1]
[05:21:18] : [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,691][INFO ][Test worker][ItJdbcBatchSelfTest] >>> Stopping
test: ItJdbcBatchSelfTest#testHeterogeneousBatchException, displayName:
testHeterogeneousBatchException(), cost: 2043ms.
[05:21:18] : [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,691][INFO ][Test worker][JdbcStatement] >>>>>
JdbcStatement#execute: [sql=DROP TABLE Person;, args=[]]
[05:21:18]W: [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,799][WARN
][%ijbst_n_0%partition-operations-15][ReplicaManager] Failed to process replica
request [request=WriteIntentSwitchReplicaRequestImpl [commit=false,
commitTimestamp=null, groupId=TablePartitionIdMessageImpl [partitionId=16,
tableId=16], timestamp=HybridTimestamp [physical=2024-12-10 05:21:17:798 +0000,
logical=1, composite=113626846186569729],
txId=0193af03-bbe4-0002-de3d-986200000001]].
[05:21:18]W: [:ignite-jdbc:integrationTest]
java.util.concurrent.CompletionException: java.lang.AssertionError:
txId=0193af03-bbe4-0002-de3d-986200000001, tableId=16, catalogVersion=8
{noformat}
What can be done?
* Prevent transactions from being created and attempting to finish after a
table destroy event.
* Improve error message in
*org.apache.ignite.internal.table.distributed.TableUtils#indexIdsAtRwTxBeginTs*
something like: there are no indexes because the table was deleted.
* Improve error logging:
**
{noformat}
Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl
[binaryTuples=ArrayList [java.nio.HeapByteBuffer[pos=0 lim=21 cap=21]],
commitPartitionId=TablePartitionIdMessageImpl [partitionId=4, tableId=16],
coordinatorId=04805429-748a-44ef-af63-b4a7e5fc1917, deleted=null,
enlistmentConsistencyToken=113626726969180168, full=false,
groupId=TablePartitionIdMessageImpl [partitionId=24, tableId=16],
requestType=RW_INSERT_ALL, schemaVersion=1, skipDelayedAck=true, tableId=16,
timestamp=HybridTimestamp [physical=2024-12-10 05:21:17:789 +0000, logical=3,
composite=113626846185979907],
transactionId=0193af03-bbdc-0020-de3d-986200000001]].
{noformat}
there is no need to output a list of tuples here
** in total, the logs are 5 GB and they contain a lot of similar messages with
the same stack trace, which makes the logs very loaded and impossible to read.
was (Author: [email protected]):
The error occurs because the test does not have time to finish within the
timeout and starts to fail.
Which results in the execution of sql *DROP TABLE Person*, transactions that
were created after this should no longer see the table and its indexes
accordingly, but they just saw a new catalogVersion in which the table no
longer existed.
Here are the log messages that led to this conclusion:
{noformat}
[05:21:18] : [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,676][INFO
][%ijbst_n_0%JRaft-FSMCaller-Disruptor_stripe_4-0][TableUtils] >>>>>
TableUtils#indexIdsAtRwTxBeginTs: [txId=0193af03-bb5f-0020-de3d-986200000001,
txBTs=HybridTimestamp [physical=2024-12-10 05:21:17:663 +0000, logical=32,
composite=113626846177722400], tableId=16, catalogVersion=7, indexSize=1]
[05:21:18] : [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,691][INFO ][Test worker][ItJdbcBatchSelfTest] >>> Stopping
test: ItJdbcBatchSelfTest#testHeterogeneousBatchException, displayName:
testHeterogeneousBatchException(), cost: 2043ms.
[05:21:18] : [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,691][INFO ][Test worker][JdbcStatement] >>>>>
JdbcStatement#execute: [sql=DROP TABLE Person;, args=[]]
[05:21:18]W: [:ignite-jdbc:integrationTest]
[2024-12-10T05:21:17,799][WARN
][%ijbst_n_0%partition-operations-15][ReplicaManager] Failed to process replica
request [request=WriteIntentSwitchReplicaRequestImpl [commit=false,
commitTimestamp=null, groupId=TablePartitionIdMessageImpl [partitionId=16,
tableId=16], timestamp=HybridTimestamp [physical=2024-12-10 05:21:17:798 +0000,
logical=1, composite=113626846186569729],
txId=0193af03-bbe4-0002-de3d-986200000001]].
[05:21:18]W: [:ignite-jdbc:integrationTest]
java.util.concurrent.CompletionException: java.lang.AssertionError:
txId=0193af03-bbe4-0002-de3d-986200000001, tableId=16, catalogVersion=8
{noformat}
What can be done?
* Improve error message in
*org.apache.ignite.internal.table.distributed.TableUtils#indexIdsAtRwTxBeginTs*
something like: there are no indexes because the table was deleted.
* Somewhere set the *catalogVersion* at the moment of the transaction start/
and then work with it everywhere, but here we need to check whether there will
be problems if we create new indexes in which we could write data.
* Improve error logging:
**
{noformat}
Failed to process replica request [request=ReadWriteMultiRowReplicaRequestImpl
[binaryTuples=ArrayList [java.nio.HeapByteBuffer[pos=0 lim=21 cap=21]],
commitPartitionId=TablePartitionIdMessageImpl [partitionId=4, tableId=16],
coordinatorId=04805429-748a-44ef-af63-b4a7e5fc1917, deleted=null,
enlistmentConsistencyToken=113626726969180168, full=false,
groupId=TablePartitionIdMessageImpl [partitionId=24, tableId=16],
requestType=RW_INSERT_ALL, schemaVersion=1, skipDelayedAck=true, tableId=16,
timestamp=HybridTimestamp [physical=2024-12-10 05:21:17:789 +0000, logical=3,
composite=113626846185979907],
transactionId=0193af03-bbdc-0020-de3d-986200000001]].
{noformat}
there is no need to output a list of tuples here
** in total, the logs are 5 GB and they contain a lot of similar messages with
the same stack trace, which makes the logs very loaded and impossible to read.
> Investigate the flaky ItJdbcBatchSelfTest
> -----------------------------------------
>
> Key: IGNITE-23926
> URL: https://issues.apache.org/jira/browse/IGNITE-23926
> Project: Ignite
> Issue Type: Task
> Reporter: Kirill Tkalenko
> Assignee: Kirill Tkalenko
> Priority: Major
> Labels: ignite-3
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In
> [JDBC|https://ci.ignite.apache.org/viewLog.html?buildId=8680682&tab=buildResultsDiv&buildTypeId=ApacheIgnite3xGradle_Test_IntegrationTests_ModuleJdbc]
> there is error
> {code:java}
> java.util.concurrent.CompletionException: java.lang.AssertionError:
> txId=01939e8e-3d9a-001c-de3d-986200000001, tableId=16, catalogVersion=8
> {code}
> for *ItJdbcBatchSelfTest*, we need to find out what the reason is.
> StackTrace:
> {noformat}
> [00:39:02]W: [:ignite-jdbc:integrationTest]
> java.util.concurrent.CompletionException: java.lang.AssertionError:
> txId=01939e8e-3d9a-001c-de3d-986200000001, tableId=16, catalogVersion=8
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.uniApplyNow(CompletableFuture.java:687)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:662)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:2168)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processWriteIntentSwitchAction$62(PartitionReplicaListener.java:1807)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processWriteIntentSwitchAction(PartitionReplicaListener.java:1802)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processOperationRequest(PartitionReplicaListener.java:763)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processOperationRequestWithTxRwCounter(PartitionReplicaListener.java:3977)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.processRequest(PartitionReplicaListener.java:536)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$invoke$1(PartitionReplicaListener.java:473)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1187)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2309)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.invoke(PartitionReplicaListener.java:473)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.replicator.ReplicaImpl.processRequest(ReplicaImpl.java:155)
> ~[ignite-replicator-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.replicator.ReplicaManager.handleReplicaRequest(ReplicaManager.java:450)
> ~[ignite-replicator-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.replicator.ReplicaManager.onReplicaMessageReceived(ReplicaManager.java:383)
> ~[ignite-replicator-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:388)
> ~[ignite-network-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:316)
> ~[ignite-network-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:245)
> ~[ignite-network-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
> ~[ignite-network-api-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
> ~[ignite-network-api-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.replicator.ReplicaService.sendToReplicaRaw(ReplicaService.java:138)
> ~[ignite-replicator-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.replicator.ReplicaService.sendToReplica(ReplicaService.java:122)
> ~[ignite-replicator-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.replicator.ReplicaService.invoke(ReplicaService.java:289)
> ~[ignite-replicator-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxMessageSender.switchWriteIntents(TxMessageSender.java:102)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.WriteIntentSwitchProcessor.switchLocalWriteIntents(WriteIntentSwitchProcessor.java:77)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxCleanupRequestHandler.processTxCleanup(TxCleanupRequestHandler.java:130)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxCleanupRequestHandler.lambda$start$0(TxCleanupRequestHandler.java:108)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.TrackableNetworkMessageHandler.onReceived(TrackableNetworkMessageHandler.java:52)
> ~[ignite-network-api-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.DefaultMessagingService.sendToSelf(DefaultMessagingService.java:388)
> ~[ignite-network-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.DefaultMessagingService.invoke0(DefaultMessagingService.java:316)
> ~[ignite-network-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:245)
> ~[ignite-network-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.wrapper.JumpToExecutorByConsistentIdAfterSend.invoke(JumpToExecutorByConsistentIdAfterSend.java:97)
> ~[ignite-network-api-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:198)
> ~[ignite-network-api-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxMessageSender.cleanup(TxMessageSender.java:131)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxCleanupRequestSender.sendCleanupMessageWithRetries(TxCleanupRequestSender.java:240)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxCleanupRequestSender.cleanupPartitions(TxCleanupRequestSender.java:226)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxCleanupRequestSender.cleanup(TxCleanupRequestSender.java:165)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.tx.impl.TxManagerImpl.cleanup(TxManagerImpl.java:826)
> ~[ignite-transactions-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$finishAndCleanup$58(PartitionReplicaListener.java:1692)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$applyCmdWithRetryOnSafeTimeReorderException$114(PartitionReplicaListener.java:2680)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:614)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:653)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.lang.Thread.run(Thread.java:833) [?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] Caused by:
> java.lang.AssertionError: txId=01939e8e-3d9a-001c-de3d-986200000001,
> tableId=16, catalogVersion=8
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.TableUtils.indexIdsAtRwTxBeginTs(TableUtils.java:58)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.indexIdsAtRwTxBeginTs(PartitionReplicaListener.java:4045)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.applyWriteIntentSwitchCommand(PartitionReplicaListener.java:1882)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener.lambda$processWriteIntentSwitchAction$61(PartitionReplicaListener.java:1809)
> ~[ignite-table-3.0.0-SNAPSHOT.jar:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] at
> java.base/java.util.concurrent.CompletableFuture.uniApplyNow(CompletableFuture.java:684)
> ~[?:?]
> [00:39:02]W: [:ignite-jdbc:integrationTest] ... 53 more
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)