[ 
https://issues.apache.org/jira/browse/RATIS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021590#comment-18021590
 ] 

Tsz-wo Sze edited comment on RATIS-2261 at 9/20/25 7:34 PM:
------------------------------------------------------------

[~jianghuazhu], thanks for your help on debugging!

bq. 1. ... After starting s3 and s4 subsequently, ...

This is a bug in the test.  It starts with 1 server s0 and then adds two new 
servers s3 and s4.   The change is a majority-add which should be disallowed; 
see RATIS-1930.  We could simply start with 3 servers in the beginning.  Then 
adding 2 new servers is fine.

bq. 2. When the client sends the setConfiguration() command, it cannot 
establish a connection with s3 or s4.  ...

It seems that the servers were starting and RPC ports were not yet ready.  The 
client would retry so it should not be a problem.



was (Author: szetszwo):
[~jianghuazhu], thanks for your help on debugging!

bq. 1. ... After starting s3 and s4 subsequently, ...

This is a bug in the test.  It starts with 1 server s0 and then adds two new 
servers s3 and s4.   The change is a majority-add which should be disallowed; 
see RATIS-1930.

bq. 2. When the client sends the setConfiguration() command, it cannot 
establish a connection with s3 or s4.  ...

It seems that the servers were starting and RPC ports were not yet ready.  The 
client would retry so it should not be a problem.


> Intermittent failure in 
> TestRaftSnapshotWithGrpc.testInstallSnapshotDuringBootstrap
> -----------------------------------------------------------------------------------
>
>                 Key: RATIS-2261
>                 URL: https://issues.apache.org/jira/browse/RATIS-2261
>             Project: Ratis
>          Issue Type: Bug
>          Components: gRPC, test
>            Reporter: Attila Doroszlai
>            Priority: Major
>         Attachments: 
> org.apache.ratis.grpc.TestRaftSnapshotWithGrpc-output.txt, 
> testInstallSnapshotDuringBootstrap.log
>
>
> {code}
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 114.816 s <<< 
> FAILURE! - in org.apache.ratis.grpc.TestRaftSnapshotWithGrpc
> org.apache.ratis.grpc.TestRaftSnapshotWithGrpc.testInstallSnapshotDuringBootstrap
>   Time elapsed: 101.468 s  <<< ERROR!
> java.util.concurrent.TimeoutException: testInstallSnapshotDuringBootstrap() 
> timed out after 100 seconds
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       Suppressed: java.io.InterruptedIOException: retry 
> policy=RetryForeverWithSleep(sleepTime = 100ms)
>               at 
> org.apache.ratis.client.impl.BlockingImpl.sendRequestWithRetry(BlockingImpl.java:138)
>               at 
> org.apache.ratis.client.impl.AdminImpl.setConfiguration(AdminImpl.java:46)
>               at 
> org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:51)
>               at 
> org.apache.ratis.client.api.AdminApi.setConfiguration(AdminApi.java:45)
>               at 
> org.apache.ratis.server.impl.MiniRaftCluster.setConfiguration(MiniRaftCluster.java:836)
>               at 
> org.apache.ratis.statemachine.RaftSnapshotBaseTest.lambda$testInstallSnapshotDuringBootstrap$6(RaftSnapshotBaseTest.java:309)
>               at 
> org.apache.ratis.server.impl.RaftServerTestUtil.runWithMinorityPeers(RaftServerTestUtil.java:231)
>               at 
> org.apache.ratis.statemachine.RaftSnapshotBaseTest.testInstallSnapshotDuringBootstrap(RaftSnapshotBaseTest.java:308)
> {code}
> Faeild in 2/100 runs:
> https://github.com/adoroszlai/ratis/actions/runs/13901407901



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to