Alexey Goncharuk created RATIS-2306:
---------------------------------------

             Summary: Initial raft group is not recovered if server is stopped 
immediately after start
                 Key: RATIS-2306
                 URL: https://issues.apache.org/jira/browse/RATIS-2306
             Project: Ratis
          Issue Type: Bug
    Affects Versions: 3.1.3
            Reporter: Alexey Goncharuk


Following up a discussion on the user list 
(https://lists.apache.org/thread/znjzgzvt488w0cf68jcc12nylydfz5vf)
As discussed on the list, `RECOVER` startup option should restore the initial 
Raft group passed in the builder. However it seems that the persistence of the 
initial group is asynchronous, and if the server is stopped quickly enough, the 
bootstrapped group is never recovered. Below is the test that can be run in the 
Ratis project:
```
  @Test
  public void testGroupRecoveryOnRestart()
      throws IOException
  {
    File tempDir = 
Files.createTempDirectory(getClass().getSimpleName()).toFile();

    RaftPeerId singlePeerId = RaftPeerId.valueOf("s0");
    RaftGroupId groupId = RaftGroupId.valueOf(UUID.randomUUID());

    RaftProperties properties = new RaftProperties();
    RaftConfigKeys.Rpc.setType(properties, RpcType.valueOf("netty"));
    RaftServerConfigKeys.setStorageDir(properties, 
Collections.singletonList(tempDir));

    {
      RaftPeer singlePeer = RaftPeer
              .newBuilder()
              .setId(singlePeerId)
              .setAddress(NetUtils.localhostWithFreePort())
              .setAdminAddress(NetUtils.localhostWithFreePort())
              .setClientAddress(NetUtils.localhostWithFreePort())
              .setDataStreamAddress(NetUtils.localhostWithFreePort())
              .build();

      try (RaftServer server = RaftServer.newBuilder()
              .setServerId(singlePeerId)
              .setGroup(RaftGroup.valueOf(groupId, singlePeer))
              .setOption(RaftStorage.StartupOption.FORMAT)
              .setStateMachine(new SimpleStateMachine4Testing())
              .setProperties(properties)
              .build()) {
        server.start();
        System.out.println("Started with group: " + 
server.getDivision(groupId).getInfo());
      }
    }
    {
      // Restart with RECOVER option
      try (RaftServer server = RaftServer.newBuilder()
              .setServerId(singlePeerId)
              .setGroup(RaftGroup.valueOf(groupId))
              .setOption(RaftStorage.StartupOption.RECOVER)
              .setStateMachine(new SimpleStateMachine4Testing())
              .setProperties(properties)
              .build()) {
        server.start();
        RaftGroup group = Iterables.getOnlyElement(server.getGroups());
        Assertions.assertEquals(groupId, group.getGroupId());
        Assertions.assertEquals(1, group.getPeers().size());
        Assertions.assertEquals(singlePeerId, 
Iterables.getOnlyElement(group.getPeers()).getId());
      }
    }
  }

```
A small sleep after the first server start or waiting for a state machine 
configuration change event will make the test pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to