[
https://issues.apache.org/jira/browse/RATIS-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953871#comment-17953871
]
Tsz-wo Sze commented on RATIS-2306:
-----------------------------------
Hi [~agoncharuk], thanks for trying out!
bq. A small sleep after the first server start or waiting for a state machine
configuration change event will make the test pass.
You are right that the server was closed too fast. It requires a Leader to
write to the RaftLog.
In the test, the Leader had not yet been elected before the server was stopped.
As a result, the group was not persisted.
This problem happens only for starting a new server since RaftLog is empty.
When the log is non-empty the first log entry has the Group information. Could
you simply pass to group peers to the RaftServer.Builder even for RECOVER?
> Initial raft group is not recovered if server is stopped immediately after
> start
> --------------------------------------------------------------------------------
>
> Key: RATIS-2306
> URL: https://issues.apache.org/jira/browse/RATIS-2306
> Project: Ratis
> Issue Type: Bug
> Affects Versions: 3.1.3
> Reporter: Alexey Goncharuk
> Priority: Major
>
> Following up a discussion on the user list
> (https://lists.apache.org/thread/znjzgzvt488w0cf68jcc12nylydfz5vf)
> As discussed on the list, {{RECOVER}} startup option should restore the
> initial Raft group passed in the builder. However it seems that the
> persistence of the initial group is asynchronous, and if the server is
> stopped quickly enough, the bootstrapped group is never recovered. Below is
> the test that can be run in the Ratis project:
> {code:java}
> @Test
> public void testGroupRecoveryOnRestart()
> throws IOException
> {
> File tempDir =
> Files.createTempDirectory(getClass().getSimpleName()).toFile();
> RaftPeerId singlePeerId = RaftPeerId.valueOf("s0");
> RaftGroupId groupId = RaftGroupId.valueOf(UUID.randomUUID());
> RaftProperties properties = new RaftProperties();
> RaftConfigKeys.Rpc.setType(properties, RpcType.valueOf("netty"));
> RaftServerConfigKeys.setStorageDir(properties,
> Collections.singletonList(tempDir));
> {
> RaftPeer singlePeer = RaftPeer
> .newBuilder()
> .setId(singlePeerId)
> .setAddress(NetUtils.localhostWithFreePort())
> .setAdminAddress(NetUtils.localhostWithFreePort())
> .setClientAddress(NetUtils.localhostWithFreePort())
> .setDataStreamAddress(NetUtils.localhostWithFreePort())
> .build();
> try (RaftServer server = RaftServer.newBuilder()
> .setServerId(singlePeerId)
> .setGroup(RaftGroup.valueOf(groupId, singlePeer))
> .setOption(RaftStorage.StartupOption.FORMAT)
> .setStateMachine(new SimpleStateMachine4Testing())
> .setProperties(properties)
> .build()) {
> server.start();
> System.out.println("Started with group: " +
> server.getDivision(groupId).getInfo());
> }
> }
> {
> // Restart with RECOVER option
> try (RaftServer server = RaftServer.newBuilder()
> .setServerId(singlePeerId)
> .setGroup(RaftGroup.valueOf(groupId))
> .setOption(RaftStorage.StartupOption.RECOVER)
> .setStateMachine(new SimpleStateMachine4Testing())
> .setProperties(properties)
> .build()) {
> server.start();
> RaftGroup group = Iterables.getOnlyElement(server.getGroups());
> Assertions.assertEquals(groupId, group.getGroupId());
> Assertions.assertEquals(1, group.getPeers().size());
> Assertions.assertEquals(singlePeerId,
> Iterables.getOnlyElement(group.getPeers()).getId());
> }
> }
> }
> {code}
> A small sleep after the first server start or waiting for a state machine
> configuration change event will make the test pass.
> A similar test with {{RaftMiniCluster}} will pass because the cluster caches
> the group as an internal field and on restart the peers are actually taken
> from the builder, and not from the recovered state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)