[ 
https://issues.apache.org/jira/browse/RATIS-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated RATIS-2306:
------------------------------------
    Description: 
Following up a discussion on the user list 
(https://lists.apache.org/thread/znjzgzvt488w0cf68jcc12nylydfz5vf)
As discussed on the list, `RECOVER` startup option should restore the initial 
Raft group passed in the builder. However it seems that the persistence of the 
initial group is asynchronous, and if the server is stopped quickly enough, the 
bootstrapped group is never recovered. Below is the test that can be run in the 
Ratis project:
{code:java}
  @Test
  public void testGroupRecoveryOnRestart()
      throws IOException
  {
    File tempDir = 
Files.createTempDirectory(getClass().getSimpleName()).toFile();

    RaftPeerId singlePeerId = RaftPeerId.valueOf("s0");
    RaftGroupId groupId = RaftGroupId.valueOf(UUID.randomUUID());

    RaftProperties properties = new RaftProperties();
    RaftConfigKeys.Rpc.setType(properties, RpcType.valueOf("netty"));
    RaftServerConfigKeys.setStorageDir(properties, 
Collections.singletonList(tempDir));

    {
      RaftPeer singlePeer = RaftPeer
              .newBuilder()
              .setId(singlePeerId)
              .setAddress(NetUtils.localhostWithFreePort())
              .setAdminAddress(NetUtils.localhostWithFreePort())
              .setClientAddress(NetUtils.localhostWithFreePort())
              .setDataStreamAddress(NetUtils.localhostWithFreePort())
              .build();

      try (RaftServer server = RaftServer.newBuilder()
              .setServerId(singlePeerId)
              .setGroup(RaftGroup.valueOf(groupId, singlePeer))
              .setOption(RaftStorage.StartupOption.FORMAT)
              .setStateMachine(new SimpleStateMachine4Testing())
              .setProperties(properties)
              .build()) {
        server.start();
        System.out.println("Started with group: " + 
server.getDivision(groupId).getInfo());
      }
    }
    {
      // Restart with RECOVER option
      try (RaftServer server = RaftServer.newBuilder()
              .setServerId(singlePeerId)
              .setGroup(RaftGroup.valueOf(groupId))
              .setOption(RaftStorage.StartupOption.RECOVER)
              .setStateMachine(new SimpleStateMachine4Testing())
              .setProperties(properties)
              .build()) {
        server.start();
        RaftGroup group = Iterables.getOnlyElement(server.getGroups());
        Assertions.assertEquals(groupId, group.getGroupId());
        Assertions.assertEquals(1, group.getPeers().size());
        Assertions.assertEquals(singlePeerId, 
Iterables.getOnlyElement(group.getPeers()).getId());
      }
    }
  }
{code}
A small sleep after the first server start or waiting for a state machine 
configuration change event will make the test pass.

  was:
Following up a discussion on the user list 
(https://lists.apache.org/thread/znjzgzvt488w0cf68jcc12nylydfz5vf)
As discussed on the list, `RECOVER` startup option should restore the initial 
Raft group passed in the builder. However it seems that the persistence of the 
initial group is asynchronous, and if the server is stopped quickly enough, the 
bootstrapped group is never recovered. Below is the test that can be run in the 
Ratis project:
```
  @Test
  public void testGroupRecoveryOnRestart()
      throws IOException
  {
    File tempDir = 
Files.createTempDirectory(getClass().getSimpleName()).toFile();

    RaftPeerId singlePeerId = RaftPeerId.valueOf("s0");
    RaftGroupId groupId = RaftGroupId.valueOf(UUID.randomUUID());

    RaftProperties properties = new RaftProperties();
    RaftConfigKeys.Rpc.setType(properties, RpcType.valueOf("netty"));
    RaftServerConfigKeys.setStorageDir(properties, 
Collections.singletonList(tempDir));

    {
      RaftPeer singlePeer = RaftPeer
              .newBuilder()
              .setId(singlePeerId)
              .setAddress(NetUtils.localhostWithFreePort())
              .setAdminAddress(NetUtils.localhostWithFreePort())
              .setClientAddress(NetUtils.localhostWithFreePort())
              .setDataStreamAddress(NetUtils.localhostWithFreePort())
              .build();

      try (RaftServer server = RaftServer.newBuilder()
              .setServerId(singlePeerId)
              .setGroup(RaftGroup.valueOf(groupId, singlePeer))
              .setOption(RaftStorage.StartupOption.FORMAT)
              .setStateMachine(new SimpleStateMachine4Testing())
              .setProperties(properties)
              .build()) {
        server.start();
        System.out.println("Started with group: " + 
server.getDivision(groupId).getInfo());
      }
    }
    {
      // Restart with RECOVER option
      try (RaftServer server = RaftServer.newBuilder()
              .setServerId(singlePeerId)
              .setGroup(RaftGroup.valueOf(groupId))
              .setOption(RaftStorage.StartupOption.RECOVER)
              .setStateMachine(new SimpleStateMachine4Testing())
              .setProperties(properties)
              .build()) {
        server.start();
        RaftGroup group = Iterables.getOnlyElement(server.getGroups());
        Assertions.assertEquals(groupId, group.getGroupId());
        Assertions.assertEquals(1, group.getPeers().size());
        Assertions.assertEquals(singlePeerId, 
Iterables.getOnlyElement(group.getPeers()).getId());
      }
    }
  }

```
A small sleep after the first server start or waiting for a state machine 
configuration change event will make the test pass.


> Initial raft group is not recovered if server is stopped immediately after 
> start
> --------------------------------------------------------------------------------
>
>                 Key: RATIS-2306
>                 URL: https://issues.apache.org/jira/browse/RATIS-2306
>             Project: Ratis
>          Issue Type: Bug
>    Affects Versions: 3.1.3
>            Reporter: Alexey Goncharuk
>            Priority: Major
>
> Following up a discussion on the user list 
> (https://lists.apache.org/thread/znjzgzvt488w0cf68jcc12nylydfz5vf)
> As discussed on the list, `RECOVER` startup option should restore the initial 
> Raft group passed in the builder. However it seems that the persistence of 
> the initial group is asynchronous, and if the server is stopped quickly 
> enough, the bootstrapped group is never recovered. Below is the test that can 
> be run in the Ratis project:
> {code:java}
>   @Test
>   public void testGroupRecoveryOnRestart()
>       throws IOException
>   {
>     File tempDir = 
> Files.createTempDirectory(getClass().getSimpleName()).toFile();
>     RaftPeerId singlePeerId = RaftPeerId.valueOf("s0");
>     RaftGroupId groupId = RaftGroupId.valueOf(UUID.randomUUID());
>     RaftProperties properties = new RaftProperties();
>     RaftConfigKeys.Rpc.setType(properties, RpcType.valueOf("netty"));
>     RaftServerConfigKeys.setStorageDir(properties, 
> Collections.singletonList(tempDir));
>     {
>       RaftPeer singlePeer = RaftPeer
>               .newBuilder()
>               .setId(singlePeerId)
>               .setAddress(NetUtils.localhostWithFreePort())
>               .setAdminAddress(NetUtils.localhostWithFreePort())
>               .setClientAddress(NetUtils.localhostWithFreePort())
>               .setDataStreamAddress(NetUtils.localhostWithFreePort())
>               .build();
>       try (RaftServer server = RaftServer.newBuilder()
>               .setServerId(singlePeerId)
>               .setGroup(RaftGroup.valueOf(groupId, singlePeer))
>               .setOption(RaftStorage.StartupOption.FORMAT)
>               .setStateMachine(new SimpleStateMachine4Testing())
>               .setProperties(properties)
>               .build()) {
>         server.start();
>         System.out.println("Started with group: " + 
> server.getDivision(groupId).getInfo());
>       }
>     }
>     {
>       // Restart with RECOVER option
>       try (RaftServer server = RaftServer.newBuilder()
>               .setServerId(singlePeerId)
>               .setGroup(RaftGroup.valueOf(groupId))
>               .setOption(RaftStorage.StartupOption.RECOVER)
>               .setStateMachine(new SimpleStateMachine4Testing())
>               .setProperties(properties)
>               .build()) {
>         server.start();
>         RaftGroup group = Iterables.getOnlyElement(server.getGroups());
>         Assertions.assertEquals(groupId, group.getGroupId());
>         Assertions.assertEquals(1, group.getPeers().size());
>         Assertions.assertEquals(singlePeerId, 
> Iterables.getOnlyElement(group.getPeers()).getId());
>       }
>     }
>   }
> {code}
> A small sleep after the first server start or waiting for a state machine 
> configuration change event will make the test pass.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to