[ 
https://issues.apache.org/jira/browse/IGNITE-22376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor resolved IGNITE-22376.
---------------------------
    Resolution: Not A Problem

The single CMG node was killed.

> RestAPI get logical topology freeze if 1 node was replaced in 3 nodes cluster
> -----------------------------------------------------------------------------
>
>                 Key: IGNITE-22376
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22376
>             Project: Ignite
>          Issue Type: Bug
>          Components: general, networking, persistence, rest
>    Affects Versions: 3.0.0-beta1, 3.0.0-beta2
>         Environment: The 3 nodes cluster running locally.
>            Reporter: Igor
>            Assignee: Aleksandr Polovtsev
>            Priority: Major
>              Labels: ignite-3
>
> *Steps to reproduce:*
>  # Create zone with replication equals to amount of nodes (2 or 3 
> corresponding)
>  # Create 10 tables inside the zone.
>  # Insert 100 rows in every table.
>  # Await all tables*partitions*nodes local state is "HEALTHY"
>  # Await all tables*partitions*nodes global state is "AVAILABLE"
>  # Kill first node with kill -9.
>  # Create new node and attach it to cluster instead of killed one.
>  # Using REST API check physical topology until only 3 alive nodes will be in 
> topology.
>  # Using REST API check *logical* topology until only 3 alive nodes will be 
> in topology.
> *Expected:*
> Data is returned.
> *Actual:*
> On the step 9 the request freeze and throws :
> {code:java}
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiException: Message: 
> java.net.SocketTimeoutException: timeout
> HTTP response code: 0
> HTTP response body: null
> HTTP response headers: null
>     at 
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.execute(ApiClient.java:1047)
>     at 
> org.gridgain.ai3tests.core.generated.restapi.api.TopologyApi.logicalWithHttpInfo(TopologyApi.java:174)
>     at 
> org.gridgain.ai3tests.core.generated.restapi.api.TopologyApi.logical(TopologyApi.java:154)
>     at 
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.getTopology(TopologyUtils.java:121)
>     at 
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.lambda$waitForTopology$0(TopologyUtils.java:74)
>     at 
> org.gridgain.ai3tests.core.utils.RetryUtils.retryOnAllowedException(RetryUtils.java:40)
>     at 
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.waitForTopology(TopologyUtils.java:72)
>     at 
> org.gridgain.ai3tests.core.ignite.topology.TopologyUtils.waitForLogicalTopology(TopologyUtils.java:56)
>     at 
> org.gridgain.ai3tests.tests.failover.ClusterFailover3NodesTest.killNodeAndReplaceWithNewEmptyOne(ClusterFailover3NodesTest.java:155)
>     at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>     at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.SocketTimeoutException: timeout
>     at okio.SocketAsyncTimeout.newTimeoutException(JvmOkio.kt:146)
>     at okio.AsyncTimeout.access$newTimeoutException(AsyncTimeout.kt:161)
>     at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:339)
>     at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:430)
>     at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:323)
>     at okhttp3.internal.http1.HeadersReader.readLine(HeadersReader.kt:29)
>     at 
> okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:180)
>     at 
> okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:110)
>     at 
> okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:93)
>     at 
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
>     at 
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient$2.intercept(ApiClient.java:1457)
>     at 
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
>     at 
> okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
>     at 
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
>     at 
> okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
>     at 
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
>     at 
> okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
>     at 
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
>     at 
> okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
>     at 
> okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
>     at 
> okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
>     at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154)
>     at 
> org.gridgain.ai3tests.core.generated.restapi.invoker.ApiClient.execute(ApiClient.java:1043)
>     ... 13 more
> Caused by: java.net.SocketTimeoutException: Read timed out
>     at java.base/java.net.SocketInputStream.socketRead0(Native Method)
>     at 
> java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
>     at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
>     at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
>     at okio.InputStreamSource.read(JvmOkio.kt:93)
>     at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:128)
>     ... 33 more
> {code}
> In the server logs continuous errors:
> {code:java}
> 2024-05-30 10:51:37:069 +0200 
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-9][AbstractClientService]
>  Fail to connect ClusterFailover3NodesTest_cluster_0, exception: 
> java.net.ConnectException.
> 2024-05-30 10:51:37:069 +0200 
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-9][ReplicatorGroupImpl]
>  Fail to check replicator connection to 
> peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower.
> 2024-05-30 10:51:37:069 +0200 
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-15][AbstractClientService]
>  Fail to connect ClusterFailover3NodesTest_cluster_0, exception: 
> java.net.ConnectException.
> 2024-05-30 10:51:37:069 +0200 
> [ERROR][%ClusterFailover3NodesTest_cluster_1%JRaft-StepDownTimer-15][ReplicatorGroupImpl]
>  Fail to check replicator connection to 
> peer=ClusterFailover3NodesTest_cluster_0, replicatorType=Follower.
> 2024-05-30 10:51:37:069 +0200 
> [WARNING][%ClusterFailover3NodesTest_cluster_1%Raft-Group-Client-6][RaftGroupServiceImpl]
>  Recoverable error during the request occurred (will be retried on the 
> randomly selected node) [request=ReadActionRequestImpl 
> [command=GetCommandImpl [key=[97, 115, 115, 105, 103, 110, 109, 101, 110, 
> 116, 115, 46, 112, 101, 110, 100, 105, 110, 103, 46, 50, 54, 95, 112, 97, 
> 114, 116, 95, 56], revision=-1], groupId=metastorage_group, 
> readOnlySafe=true], peer=Peer 
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0], newPeer=Peer 
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0]].
> java.util.concurrent.CompletionException: java.net.ConnectException: Peer 
> ClusterFailover3NodesTest_cluster_0 is unavailable
>   at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1099)
>   at 
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:558)
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.ConnectException: Peer 
> ClusterFailover3NodesTest_cluster_0 is unavailable
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.resolvePeer(RaftGroupServiceImpl.java:806)
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:557)
>   ... 7 more
> 2024-05-30 10:51:37:069 +0200 
> [WARNING][%ClusterFailover3NodesTest_cluster_1%Raft-Group-Client-11][RaftGroupServiceImpl]
>  Recoverable error during the request occurred (will be retried on the 
> randomly selected node) [request=ReadActionRequestImpl 
> [command=GetCommandImpl [key=[97, 115, 115, 105, 103, 110, 109, 101, 110, 
> 116, 115, 46, 112, 101, 110, 100, 105, 110, 103, 46, 49, 56, 95, 112, 97, 
> 114, 116, 95, 49, 48], revision=-1], groupId=metastorage_group, 
> readOnlySafe=true], peer=Peer 
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0], newPeer=Peer 
> [consistentId=ClusterFailover3NodesTest_cluster_0, idx=0]].
> java.util.concurrent.CompletionException: java.net.ConnectException: Peer 
> ClusterFailover3NodesTest_cluster_0 is unavailable
>   at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
>   at 
> java.base/java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1099)
>   at 
> java.base/java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235)
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:558)
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$handleThrowable$41(RaftGroupServiceImpl.java:605)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.ConnectException: Peer 
> ClusterFailover3NodesTest_cluster_0 is unavailable
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.resolvePeer(RaftGroupServiceImpl.java:806)
>   at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:557)
>   ... 7 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to