Alexey Goncharuk created RATIS-2306:
---------------------------------------
Summary: Initial raft group is not recovered if server is stopped
immediately after start
Key: RATIS-2306
URL: https://issues.apache.org/jira/browse/RATIS-2306
Project: Ratis
Issue Type: Bug
Affects Versions: 3.1.3
Reporter: Alexey Goncharuk
Following up a discussion on the user list
(https://lists.apache.org/thread/znjzgzvt488w0cf68jcc12nylydfz5vf)
As discussed on the list, `RECOVER` startup option should restore the initial
Raft group passed in the builder. However it seems that the persistence of the
initial group is asynchronous, and if the server is stopped quickly enough, the
bootstrapped group is never recovered. Below is the test that can be run in the
Ratis project:
```
@Test
public void testGroupRecoveryOnRestart()
throws IOException
{
File tempDir =
Files.createTempDirectory(getClass().getSimpleName()).toFile();
RaftPeerId singlePeerId = RaftPeerId.valueOf("s0");
RaftGroupId groupId = RaftGroupId.valueOf(UUID.randomUUID());
RaftProperties properties = new RaftProperties();
RaftConfigKeys.Rpc.setType(properties, RpcType.valueOf("netty"));
RaftServerConfigKeys.setStorageDir(properties,
Collections.singletonList(tempDir));
{
RaftPeer singlePeer = RaftPeer
.newBuilder()
.setId(singlePeerId)
.setAddress(NetUtils.localhostWithFreePort())
.setAdminAddress(NetUtils.localhostWithFreePort())
.setClientAddress(NetUtils.localhostWithFreePort())
.setDataStreamAddress(NetUtils.localhostWithFreePort())
.build();
try (RaftServer server = RaftServer.newBuilder()
.setServerId(singlePeerId)
.setGroup(RaftGroup.valueOf(groupId, singlePeer))
.setOption(RaftStorage.StartupOption.FORMAT)
.setStateMachine(new SimpleStateMachine4Testing())
.setProperties(properties)
.build()) {
server.start();
System.out.println("Started with group: " +
server.getDivision(groupId).getInfo());
}
}
{
// Restart with RECOVER option
try (RaftServer server = RaftServer.newBuilder()
.setServerId(singlePeerId)
.setGroup(RaftGroup.valueOf(groupId))
.setOption(RaftStorage.StartupOption.RECOVER)
.setStateMachine(new SimpleStateMachine4Testing())
.setProperties(properties)
.build()) {
server.start();
RaftGroup group = Iterables.getOnlyElement(server.getGroups());
Assertions.assertEquals(groupId, group.getGroupId());
Assertions.assertEquals(1, group.getPeers().size());
Assertions.assertEquals(singlePeerId,
Iterables.getOnlyElement(group.getPeers()).getId());
}
}
}
```
A small sleep after the first server start or waiting for a state machine
configuration change event will make the test pass.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)