[
https://issues.apache.org/jira/browse/RATIS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870233#comment-17870233
]
Tsz-wo Sze commented on RATIS-2115:
-----------------------------------
Copying the details here:
- [*ERROR*] *
TestLeaderInstallSnapshot.testInstallSnapshotLeaderSwitch:53->InstallSnapshotFromLeaderTests.testInstallSnapshotLeaderSwitch:94->InstallSnapshotFromLeaderTests.testInstallSnapshotDuringLeaderSwitch:170
Unexpected exception type thrown, expected:
<org.apache.ratis.protocol.exceptions.RaftRetryFailureException> but was:
<org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException>*
This is a ParameterizedTest with separateHeartbeat = false. This case is okay
since the exception is changed as shown in the message above:
RaftRetryFailureException vs ReconfigurationTimeoutException.
- [*ERROR*] *
TestLeaderInstallSnapshot.testInstallSnapshotLeaderSwitch:53->InstallSnapshotFromLeaderTests.testInstallSnapshotLeaderSwitch:94->InstallSnapshotFromLeaderTests.testInstallSnapshotDuringLeaderSwitch:141
» IllegalState*
This is the same ParameterizedTest with separateHeartbeat = true. This
IllegalStateException was for "No leader yet". The test blocked the peers so
they could not vote for each other. They kept getting 0 responses as shown
below. This case is a test problem.
2024-06-17 00:13:26,950 [s2@group-F9E5FA170112-LeaderElection6] INFO
impl.LeaderElection (LeaderElection.java:logAndReturn(89)) -
s2@group-F9E5FA170112-LeaderElection6: PRE_VOTE TIMEOUT received 0 response(s)
and 0 exception(s):
2024-06-17 00:13:26,978 [s1@group-F9E5FA170112-LeaderElection7] INFO
impl.LeaderElection (LeaderElection.java:logAndReturn(89)) -
s1@group-F9E5FA170112-LeaderElection7: PRE_VOTE TIMEOUT received 0 response(s)
and 0 exception(s):
- [*ERROR*] *
TestLeaderInstallSnapshot.testSeparateSnapshotInstallPath(Boolean)[1] »
Timeout*
- [*ERROR*] *
TestLeaderInstallSnapshot.testSeparateSnapshotInstallPath(Boolean)[2] »
Timeout*
The test started with only 1 server. Both test cases can pass after changing to
3 servers.
- [*ERROR*] * TestRetryCacheWithNettyRpc.testRetryOnNewLeader » Timeout
testRetryOnNewLeader...*
The timeout was because setConf kept failing. The problem was that the setConf
tried to remove 2 peers and add 2 peers at the same time. When changing the
setConf to either (1) remove 1 peer and add 1 peer, or (2) remove 2 peers and
add 0 peer, the test can pass.
> Fix flaky tests in 3.1.0-rc1
> ----------------------------
>
> Key: RATIS-2115
> URL: https://issues.apache.org/jira/browse/RATIS-2115
> Project: Ratis
> Issue Type: Test
> Components: test
> Affects Versions: 3.1.0
> Reporter: Song Ziyang
> Priority: Minor
>
> As discussed in
> [https://lists.apache.org/thread/ygz28ff3ljoz1bbphn9w463hgqrl06nc]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)