[
https://issues.apache.org/jira/browse/IGNITE-23877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17903573#comment-17903573
]
Roman Puchkovskiy commented on IGNITE-23877:
--------------------------------------------
There is no step 10, probably the issue needs to be edited.
Also, the exact DDLs would be helpful, especially for CREATE ZONE.
Anyway:
# If REPLICAS is 1, this means that half of primary replicas is on the second
node; when you stop it, half partitions don't have primaries anymore, so
'nothing works' is the expected behavior (as you only had 1 copy of data and
it's not available)
# If REPLICAS is 2, then for Raft which we use, if one node is down, there is
no majority on any partition, so no partition has a functioning primary. This
is also expected (even replication factors don't make a lot of sense, 2 is
especially meaningless as it doesn't allow to maintain availability even if
just 1 node is offline)
> "Replication is timed out" when 1 of 2 nodes is down
> ----------------------------------------------------
>
> Key: IGNITE-23877
> URL: https://issues.apache.org/jira/browse/IGNITE-23877
> Project: Ignite
> Issue Type: Bug
> Components: persistence
> Affects Versions: 3.0.0-beta1
> Environment: 2 nodes (1 node is CMG, each node
> {color:#067d17}"-Xms512m"{color},
> {color:#067d17}"-Xmx{color}{color:#067d17}1536{color}{color:#067d17}m"{color}),
> each on separate host. Each host vCPU: 4, Memory: 32GB.
> Reporter: Igor
> Priority: Major
> Labels: ignite-3
> Attachments: servers_logsr.zip
>
>
> *Steps to reproduce:*
> # Start 2 nodes (1 node is CMG, each node {color:#067d17}"-Xms512m"{color},
> {color:#067d17}"-Xmx{color}1536{color:#067d17}m"{color}), each on separate
> host. Each host vCPU: 4, Memory: 32GB.
> # Setup connection to both nodes:
> {code:java}
> IgniteClient.builder().retryPolicy(new
> RetryLimitPolicy()).addresses(thinClientEndpoints.toArray(new
> String[0])).build()
> {code}
> # Create distribution zone
> # Create table
> # Insert row(s)
> # Select all before kill the node
> # Await all partitions of all tables local state is "HEALTHY"
> # Await all partitions of all tables global state is "AVAILABLE"
> # Kill the second (non-CMG) node
> # Select all after kill the node
> *Expected:*
> Correct data is returned.
> *Actual:*
> Exception returned on step 10:
> {code:java}
> org.opentest4j.AssertionFailedError: org.opentest4j.AssertionFailedError:
> Select after node is killed ==> Unexpected exception thrown:
> org.apache.ignite.sql.SqlException: Replication is timed out
> [replicaGrpId=17_part_10]
> org.opentest4j.AssertionFailedError: Select after node is killed ==>
> Unexpected exception thrown: org.apache.ignite.sql.SqlException: Replication
> is timed out [replicaGrpId=17_part_10]
> at
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:152)
> at
> app//org.junit.jupiter.api.AssertDoesNotThrow.createAssertionFailedError(AssertDoesNotThrow.java:84)
> at
> app//org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:53)
> at
> app//org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:40)
> at
> app//org.junit.jupiter.api.Assertions.assertDoesNotThrow(Assertions.java:3183)
> at
> app//org.gridgain.ai3tests.tests.ConnectionAfterNodeIsKilledTest.testThinClientConnectionToMultipleHostAfter1NodeIsKilled(ConnectionAfterNodeIsKilledTest.java:136)
> at [email protected]/java.lang.reflect.Method.invoke(Method.java:580)
> at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:317)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
> at
> [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
> at [email protected]/java.lang.Thread.run(Thread.java:1583)
> Caused by: org.apache.ignite.sql.SqlException: IGN-REP-3
> TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out
> [replicaGrpId=17_part_10]
> at
> [email protected]/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCauseInternal(ExceptionUtils.java:658)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:476)
> at
> app//org.apache.ignite.internal.client.sql.ClientSql.execute(ClientSql.java:106)
> at app//org.apache.ignite.sql.IgniteSql.execute(IgniteSql.java:57)
> at
> app//org.gridgain.ai3tests.tests.teststeps.ThinClientSteps.lambda$executeQuery$0(ThinClientSteps.java:64)
> at app//io.qameta.allure.Allure.lambda$step$1(Allure.java:127)
> at app//io.qameta.allure.Allure.step(Allure.java:181)
> at app//io.qameta.allure.Allure.step(Allure.java:125)
> at
> app//org.gridgain.ai3tests.tests.teststeps.ThinClientSteps.executeQuery(ThinClientSteps.java:64)
> at app//org.gridgain.ai3tests.tests.TestUtils.selectAll(TestUtils.java:174)
> at
> app//org.gridgain.ai3tests.tests.ConnectionAfterNodeIsKilledTest.lambda$testThinClientConnectionToMultipleHostAfter1NodeIsKilled$0(ConnectionAfterNodeIsKilledTest.java:137)
> at
> app//org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:49)
> ... 8 more
> Caused by: java.util.concurrent.CompletionException:
> org.apache.ignite.sql.SqlException: IGN-REP-3
> TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out
> [replicaGrpId=17_part_10]
> at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
> at
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
> at
> java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:936)
> at
> java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
> at
> java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
> at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
> at
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
> at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
> at
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
> at
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
> Caused by: org.apache.ignite.sql.SqlException: IGN-REP-3
> TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out
> [replicaGrpId=17_part_10]
> at
> [email protected]/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
> at
> app//org.apache.ignite.internal.util.ViewUtils.copyExceptionWithCauseIfPossible(ViewUtils.java:91)
> at
> app//org.apache.ignite.internal.util.ViewUtils.ensurePublicException(ViewUtils.java:71)
> at
> app//org.apache.ignite.internal.client.TcpClientChannel.lambda$send$4(TcpClientChannel.java:388)
> at
> [email protected]/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934)
> ... 7 more
> Caused by: org.apache.ignite.sql.SqlException: IGN-REP-3
> TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 Replication is timed out
> [replicaGrpId=17_part_10]
> at
> [email protected]/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils$1.copy(ExceptionUtils.java:789)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils$ExceptionFactory.createCopy(ExceptionUtils.java:723)
> at
> app//org.apache.ignite.internal.util.ExceptionUtils.copyExceptionWithCause(ExceptionUtils.java:525)
> at
> app//org.apache.ignite.internal.client.TcpClientChannel.readError(TcpClientChannel.java:554)
> at
> app//org.apache.ignite.internal.client.TcpClientChannel.processNextMessage(TcpClientChannel.java:448)
> at
> app//org.apache.ignite.internal.client.TcpClientChannel.onMessage(TcpClientChannel.java:271)
> at
> app//org.apache.ignite.internal.client.io.netty.NettyClientConnection.onMessage(NettyClientConnection.java:117)
> at
> app//org.apache.ignite.internal.client.io.netty.NettyClientMessageHandler.channelRead(NettyClientMessageHandler.java:33)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at
> app//io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
> at
> app//io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
> at
> app//io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
> at
> app//io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
> at
> app//io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
> at
> app//io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
> at
> app//io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
> at
> app//io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
> at
> app//io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
> at app//io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
> at
> app//io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
> at
> app//io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> app//io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at [email protected]/java.lang.Thread.run(Thread.java:1583)
> Caused by: org.apache.ignite.lang.IgniteException: IGN-REP-3
> TraceId:4fffa9f4-9fcb-43b0-bc7d-8fb37c6dfaa0 To see the full stack trace set
> clientConnector.sendServerExceptionStackTraceToClient:true
> at
> app//org.apache.ignite.internal.client.TcpClientChannel.readError(TcpClientChannel.java:519)
> ... 25 more {code}
> [^servers_logsr.zip]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)