Denis Chudov created IGNITE-27426:
-------------------------------------

             Summary: Txn request retries on network errors
                 Key: IGNITE-27426
                 URL: https://issues.apache.org/jira/browse/IGNITE-27426
             Project: Ignite
          Issue Type: Bug
            Reporter: Denis Chudov


We should consider txn request retries on network errors caused by, for 
example, node stop, like we do for primary replica miss/replica unavailable/etc.

Stack trace for context:
{code:java}
2025-12-19 18:16:43:524 +0000 
[WARNING][%poc-tester-SERVER-192.168.208.114-id-0%partition-operations-23][ReplicaManager]
 Failed to process replica request 
[request=ReadOnlyScanRetrieveBatchReplicaRequestImpl [batchSize=512, 
columnsToInclude=null, coordinatorId=82b88b02-3969-46c5-825f-5e56c3580470, 
exactKey=null, flags=0, groupId=ZonePartitionIdMessageImpl [partitionId=7, 
zoneId=24], indexToUse=null, lowerBoundPrefix=null, 
readTimestamp=HybridTimestamp [physical=2025-12-19 18:16:37:999 +0000, 
logical=0, composite=115747599024062464], scanId=3767, tableId=180, 
timestamp=null, transactionId=019b37d4-014b-0000-773b-bc7300000001, 
upperBoundPrefix=null, 
usePrimary=true]].java.util.concurrent.CompletionException: 
org.apache.ignite.internal.replicator.exception.ReplicationException: IGN-REP-1 
Failed to process replica request [replicaGroupId=ZonePartitionIdMessageImpl 
[partitionId=1, zoneId=24]] TraceId:3ca4e5f1    at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
    at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
    at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
    at 
org.apache.ignite.internal.replicator.ReplicaService.lambda$sendToReplicaRaw$8(ReplicaService.java:151)
    at 
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
    at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
    at 
java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610)
    at 
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:910)
    at 
java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)Caused by: 
org.apache.ignite.internal.replicator.exception.ReplicationException: IGN-REP-1 
Failed to process replica request [replicaGroupId=ZonePartitionIdMessageImpl 
[partitionId=1, zoneId=24]] TraceId:3ca4e5f1    at 
org.apache.ignite.internal.util.ExceptionUtils.lambda$withCause$1(ExceptionUtils.java:541)
    at 
org.apache.ignite.internal.util.ExceptionUtils.withCauseInternal(ExceptionUtils.java:606)
    at 
org.apache.ignite.internal.util.ExceptionUtils.withCause(ExceptionUtils.java:541)
    ... 10 moreCaused by: 
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: 
/192.168.208.86:3344Caused by: java.net.ConnectException: Connection refused    
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)    at 
java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
    at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
    at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:384)
    at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.handle(AbstractNioChannel.java:432)
    at 
io.netty.channel.nio.NioIoHandler$DefaultNioRegistration.handle(NioIoHandler.java:388)
    at 
io.netty.channel.nio.NioIoHandler.processSelectedKey(NioIoHandler.java:596)    
at 
io.netty.channel.nio.NioIoHandler.processSelectedKeysOptimized(NioIoHandler.java:571)
    at 
io.netty.channel.nio.NioIoHandler.processSelectedKeys(NioIoHandler.java:512)    
at io.netty.channel.nio.NioIoHandler.run(NioIoHandler.java:484)    at 
io.netty.channel.SingleThreadIoEventLoop.runIo(SingleThreadIoEventLoop.java:225)
    at 
io.netty.channel.SingleThreadIoEventLoop.run(SingleThreadIoEventLoop.java:196)  
  at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1193)
    at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to