[ 
https://issues.apache.org/jira/browse/RATIS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954269#comment-17954269
 ] 

JiangHua Zhu commented on RATIS-2261:
-------------------------------------

I found at least 2 related issues.
1. When s0 starts ahead of schedule and becomes the Leader role. After starting 
s3 and s4 subsequently, s0 will gradually become a follower, because s3 or s4 
itself carries a higher term and sends appendEntries to s0.
Here are some logs:

[link 
testInstallSnapshotDuringBootstrap.log|https://issues.apache.org/jira/secure/attachment/13076705/testInstallSnapshotDuringBootstrap.log]

2. When the client sends the setConfiguration() command, it cannot establish a 
connection with s0. Here are some timeout logs.


> Intermittent failure in 
> TestRaftSnapshotWithGrpc.testInstallSnapshotDuringBootstrap
> -----------------------------------------------------------------------------------
>
>                 Key: RATIS-2261
>                 URL: https://issues.apache.org/jira/browse/RATIS-2261
>             Project: Ratis
>          Issue Type: Bug
>          Components: gRPC, test
>            Reporter: Attila Doroszlai
>            Priority: Major
>         Attachments: 
> org.apache.ratis.grpc.TestRaftSnapshotWithGrpc-output.txt, 
> testInstallSnapshotDuringBootstrap.log
>
>
> {code}
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 114.816 s <<< 
> FAILURE! - in org.apache.ratis.grpc.TestRaftSnapshotWithGrpc
> org.apache.ratis.grpc.TestRaftSnapshotWithGrpc.testInstallSnapshotDuringBootstrap
>   Time elapsed: 101.468 s  <<< ERROR!
> java.util.concurrent.TimeoutException: testInstallSnapshotDuringBootstrap() 
> timed out after 100 seconds
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       Suppressed: java.io.InterruptedIOException: retry 
> policy=RetryForeverWithSleep(sleepTime = 100ms)
>               at 
> org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:138)
>               at 
> org.apache.ratis.client.impl.AdminImpl.setConfiguration(AdminImpl.java:46)
>               at 
> org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:51)
>               at 
> org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:45)
>               at 
> org.apache.ratis.server.impl.MiniRaftCluster.setConfiguration(MiniRaftCluster.java:836)
>               at 
> org.apache.ratis.statemachine.RaftSnapshotBaseTest.lambda$testInstallSnapshotDuringBootstrap$6(RaftSnapshotBaseTest.java:309)
>               at 
> org.apache.ratis.server.impl.RaftServerTestUtil.runWithMinorityPeers(RaftServerTestUtil.java:231)
>               at 
> org.apache.ratis.statemachine.RaftSnapshotBaseTest.testInstallSnapshotDuringBootstrap(RaftSnapshotBaseTest.java:308)
> {code}
> Faeild in 2/100 runs:
> https://github.com/adoroszlai/ratis/actions/runs/13901407901



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to