[ 
https://issues.apache.org/jira/browse/RATIS-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953871#comment-17953871
 ] 

Tsz-wo Sze commented on RATIS-2306:
-----------------------------------

Hi [~agoncharuk], thanks for trying out!

bq. A small sleep after the first server start or waiting for a state machine 
configuration change event will make the test pass.

You are right that the server was closed too fast.  It requires a Leader to 
write to the RaftLog.

In the test, the Leader had not yet been elected before the server was stopped. 
 As a result, the group was not persisted.  

This problem happens only for starting a new server since RaftLog is empty.   
When the log is non-empty the first log entry has the Group information.  Could 
you simply pass to group peers to the RaftServer.Builder even for RECOVER?


> Initial raft group is not recovered if server is stopped immediately after 
> start
> --------------------------------------------------------------------------------
>
>                 Key: RATIS-2306
>                 URL: https://issues.apache.org/jira/browse/RATIS-2306
>             Project: Ratis
>          Issue Type: Bug
>    Affects Versions: 3.1.3
>            Reporter: Alexey Goncharuk
>            Priority: Major
>
> Following up a discussion on the user list 
> (https://lists.apache.org/thread/znjzgzvt488w0cf68jcc12nylydfz5vf)
> As discussed on the list, {{RECOVER}} startup option should restore the 
> initial Raft group passed in the builder. However it seems that the 
> persistence of the initial group is asynchronous, and if the server is 
> stopped quickly enough, the bootstrapped group is never recovered. Below is 
> the test that can be run in the Ratis project:
> {code:java}
>   @Test
>   public void testGroupRecoveryOnRestart()
>       throws IOException
>   {
>     File tempDir = 
> Files.createTempDirectory(getClass().getSimpleName()).toFile();
>     RaftPeerId singlePeerId = RaftPeerId.valueOf("s0");
>     RaftGroupId groupId = RaftGroupId.valueOf(UUID.randomUUID());
>     RaftProperties properties = new RaftProperties();
>     RaftConfigKeys.Rpc.setType(properties, RpcType.valueOf("netty"));
>     RaftServerConfigKeys.setStorageDir(properties, 
> Collections.singletonList(tempDir));
>     {
>       RaftPeer singlePeer = RaftPeer
>               .newBuilder()
>               .setId(singlePeerId)
>               .setAddress(NetUtils.localhostWithFreePort())
>               .setAdminAddress(NetUtils.localhostWithFreePort())
>               .setClientAddress(NetUtils.localhostWithFreePort())
>               .setDataStreamAddress(NetUtils.localhostWithFreePort())
>               .build();
>       try (RaftServer server = RaftServer.newBuilder()
>               .setServerId(singlePeerId)
>               .setGroup(RaftGroup.valueOf(groupId, singlePeer))
>               .setOption(RaftStorage.StartupOption.FORMAT)
>               .setStateMachine(new SimpleStateMachine4Testing())
>               .setProperties(properties)
>               .build()) {
>         server.start();
>         System.out.println("Started with group: " + 
> server.getDivision(groupId).getInfo());
>       }
>     }
>     {
>       // Restart with RECOVER option
>       try (RaftServer server = RaftServer.newBuilder()
>               .setServerId(singlePeerId)
>               .setGroup(RaftGroup.valueOf(groupId))
>               .setOption(RaftStorage.StartupOption.RECOVER)
>               .setStateMachine(new SimpleStateMachine4Testing())
>               .setProperties(properties)
>               .build()) {
>         server.start();
>         RaftGroup group = Iterables.getOnlyElement(server.getGroups());
>         Assertions.assertEquals(groupId, group.getGroupId());
>         Assertions.assertEquals(1, group.getPeers().size());
>         Assertions.assertEquals(singlePeerId, 
> Iterables.getOnlyElement(group.getPeers()).getId());
>       }
>     }
>   }
> {code}
> A small sleep after the first server start or waiting for a state machine 
> configuration change event will make the test pass.
> A similar test with {{RaftMiniCluster}} will pass because the cluster caches 
> the group as an internal field and on restart the peers are actually taken 
> from the builder, and not from the recovered state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to