[
https://issues.apache.org/jira/browse/IGNITE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor resolved IGNITE-22376.
---------------------------
Resolution: Not A Problem
The single CMG node was killed.
> RestAPI get logical topology freeze if 1 node was replaced in 3 nodes cluster
> -----------------------------------------------------------------------------
>
> Key: IGNITE-22376
> URL: https://issues.apache.org/jira/browse/IGNITE-22376
> Project: Ignite
> Issue Type: Bug
> Components: general, networking, persistence, rest
> Affects Versions: 3.0.0-beta1, 3.0.0-beta2
> Environment: The 3 nodes cluster running locally.
> Reporter: Igor
> Assignee: Aleksandr Polovtsev
> Priority: Major
> Labels: ignite-3
>
> *Steps to reproduce:*
> # Create zone with replication equals to amount of nodes (2 or 3
> corresponding)
> # Create 10 tables inside the zone.
> # Insert 100 rows in every table.
> # Await all tables*partitions*nodes local state is "HEALTHY"
> # Await all tables*partitions*nodes global state is "AVAILABLE"
> # Kill first node with kill -9.
> # Create new node and attach it to cluster instead of killed one.
> # Using REST API check physical topology until only 3 alive nodes will be in
> topology.
> # Using REST API check *logical* topology until only 3 alive nodes will be
> in topology.
> *Expected:*
> Data is returned.
> *Actual:*
> On the step 9 the request freeze and throws :
> {code:java}
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiException: Message:
> java.net.SocketTimeoutException: timeout
> HTTP response code: 0
> HTTP response body: null
> HTTP response headers: null
> at
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.execute(ApiClient.java:1047)
> at
> org.gridgain.ai3tests.core.generated.restapi.api.TopologyApi.logicalWithHttpInfo(TopologyApi.java:174)
> at
> org.gridgain.ai3tests.core.generated.restapi.api.TopologyApi.logical(TopologyApi.java:154)
> at
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.getTopology(TopologyUtils.java:121)
> at
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.lambda$waitForTopology$0(TopologyUtils.java:74)
> at
> org.gridgain.ai3tests.core.utils.RetryUtils.retryOnAllowedException(RetryUtils.java:40)
> at
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.waitForTopology(TopologyUtils.java:72)
> at
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.waitForLogicalTopology(TopologyUtils.java:56)
> at
> org.gridgain.ai3tests.tests.failover.ClusterFailover3NodesTest.killNodeAndReplaceWithNewEmptyOne(ClusterFailover3NodesTest.java:155)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.SocketTimeoutException: timeout
> at okio.SocketAsyncTimeout.newTimeoutException(JvmOkio.kt:146)
> at okio.AsyncTimeout.access$newTimeoutException(AsyncTimeout.kt:161)
> at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:339)
> at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:430)
> at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:323)
> at okhttp3.internal.http1.HeadersReader.readLine(HeadersReader.kt:29)
> at
> okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:180)
> at
> okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
> at
> okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
> at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
> at
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient$2.intercept(ApiClient.java:1457)
> at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
> at
> okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
> at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
> at
> okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
> at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
> at
> okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
> at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
> at
> okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
> at
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
> at
> okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
> at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
> at
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.execute(ApiClient.java:1043)
> ... 13 more
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.base/java.net.SocketInputStream.socketRead0(Native Method)
> at
> java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
> at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
> at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
> at okio.InputStreamSource.read(JvmOkio.kt:93)
> at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:128)
> ... 33 more
> {code}
> In the server logs continuous errors:
> {code:java}
> 2024-05-30 10:51:37:069 +0200
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-9][AbstractClientService]
> Fail to connect ClusterFailover3NodesTest_cluster_0, exception:
> java.net.ConnectException.
> 2024-05-30 10:51:37:069 +0200
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-9][ReplicatorGroupImpl]
> Fail to check replicator connection to
> peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower.
> 2024-05-30 10:51:37:069 +0200
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-15][AbstractClientService]
> Fail to connect ClusterFailover3NodesTest_cluster_0, exception:
> java.net.ConnectException.
> 2024-05-30 10:51:37:069 +0200
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-15][ReplicatorGroupImpl]
> Fail to check replicator connection to
> peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower.
> 2024-05-30 10:51:37:069 +0200
> [WARNING][%ClusterFailover3NodesTest_cluster_1%Raft-Group-Client-6][RaftGroupServiceImpl]
> Recoverable error during the request occurred (will be retried on the
> randomly selected node) [request=ReadActionRequestImpl
> [command=GetCommandImpl [key=[97, 115, 115, 105, 103, 110, 109, 101, 110,
> 116, 115, 46, 112, 101, 110, 100, 105, 110, 103, 46, 50, 54, 95, 112, 97,
> 114, 116, 95, 56], revision=-1], groupId=metastorage_group,
> readOnlySafe=true], peer=Peer
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0], newPeer=Peer
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0]].
> java.util.concurrent.CompletionException: java.net.ConnectException: Peer
> ClusterFailover3NodesTest_cluster_0 is unavailable
> at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
> at
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1099)
> at
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:558)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.ConnectException: Peer
> ClusterFailover3NodesTest_cluster_0 is unavailable
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.resolvePeer(RaftGroupServiceImpl.java:806)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:557)
> ... 7 more
> 2024-05-30 10:51:37:069 +0200
> [WARNING][%ClusterFailover3NodesTest_cluster_1%Raft-Group-Client-11][RaftGroupServiceImpl]
> Recoverable error during the request occurred (will be retried on the
> randomly selected node) [request=ReadActionRequestImpl
> [command=GetCommandImpl [key=[97, 115, 115, 105, 103, 110, 109, 101, 110,
> 116, 115, 46, 112, 101, 110, 100, 105, 110, 103, 46, 49, 56, 95, 112, 97,
> 114, 116, 95, 49, 48], revision=-1], groupId=metastorage_group,
> readOnlySafe=true], peer=Peer
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0], newPeer=Peer
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0]].
> java.util.concurrent.CompletionException: java.net.ConnectException: Peer
> ClusterFailover3NodesTest_cluster_0 is unavailable
> at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
> at
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1099)
> at
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:558)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.ConnectException: Peer
> ClusterFailover3NodesTest_cluster_0 is unavailable
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.resolvePeer(RaftGroupServiceImpl.java:806)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:557)
> ... 7 more
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)