Denis Chudov created IGNITE-27426:
-------------------------------------
Summary: Txn request retries on network errors
Key: IGNITE-27426
URL: https://issues.apache.org/jira/browse/IGNITE-27426
Project: Ignite
Issue Type: Bug
Reporter: Denis Chudov
We should consider txn request retries on network errors caused by, for
example, node stop, like we do for primary replica miss/replica unavailable/etc.
Stack trace for context:
{code:java}
2025-12-19 18:16:43:524 +0000
[WARNING][%poc-tester-SERVER-192.168.208.114-id-0%partition-operations-23][ReplicaManager]
Failed to process replica request
[request=ReadOnlyScanRetrieveBatchReplicaRequestImpl [batchSize=512,
columnsToInclude=null, coordinatorId=82b88b02-3969-46c5-825f-5e56c3580470,
exactKey=null, flags=0, groupId=ZonePartitionIdMessageImpl [partitionId=7,
zoneId=24], indexToUse=null, lowerBoundPrefix=null,
readTimestamp=HybridTimestamp [physical=2025-12-19 18:16:37:999 +0000,
logical=0, composite=115747599024062464], scanId=3767, tableId=180,
timestamp=null, transactionId=019b37d4-014b-0000-773b-bc7300000001,
upperBoundPrefix=null,
usePrimary=true]].java.util.concurrent.CompletionException:
org.apache.ignite.internal.replicator.exception.ReplicationException: IGN-REP-1
Failed to process replica request [replicaGroupId=ZonePartitionIdMessageImpl
[partitionId=1, zoneId=24]] TraceId:3ca4e5f1 at
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
at
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at
org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplicaRaw$8(ReplicaService.java:151)
at
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at
java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
at
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:910)
at
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)Caused by:
org.apache.ignite.internal.replicator.exception.ReplicationException: IGN-REP-1
Failed to process replica request [replicaGroupId=ZonePartitionIdMessageImpl
[partitionId=1, zoneId=24]] TraceId:3ca4e5f1 at
org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$1(ExceptionUtils.java:541)
at
org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:606)
at
org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:541)
... 10 moreCaused by:
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused:
/192.168.208.86:3344Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at
java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:384)
at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.handle(AbstractNioChannel.java:432)
at
io.netty.channel.nio.NioIoHandler$DefaultNioRegistration.handle(NioIoHandler.java:388)
at
io.netty.channel.nio.NioIoHandler.processSelectedKey(NioIoHandler.java:596)
at
io.netty.channel.nio.NioIoHandler.processSelectedKeysOptimized(NioIoHandler.java:571)
at
io.netty.channel.nio.NioIoHandler.processSelectedKeys(NioIoHandler.java:512)
at io.netty.channel.nio.NioIoHandler.run(NioIoHandler.java:484) at
io.netty.channel.SingleThreadIoEventLoop.runIo(SingleThreadIoEventLoop.java:225)
at
io.netty.channel.SingleThreadIoEventLoop.run(SingleThreadIoEventLoop.java:196)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1193)
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)