Hi William and others,

I understand more about the failures/errors.  All of them were test
problems; see below:


- [*ERROR*] *  
TestLeaderInstallSnapshot.testInstallSnapshotLeaderSwitch:53->InstallSnapshotFromLeaderTests.testInstallSnapshotLeaderSwitch:94->InstallSnapshotFromLeaderTests.testInstallSnapshotDuringLeaderSwitch:170
Unexpected exception type thrown, expected:
<org.apache.ratis.protocol.exceptions.RaftRetryFailureException> but was:
<org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException>*

This is a ParameterizedTest with separateHeartbeat = false.
This case is okay since the exception is changed as shown in the message
above: RaftRetryFailureException vs ReconfigurationTimeoutException.


- [*ERROR*] *  
TestLeaderInstallSnapshot.testInstallSnapshotLeaderSwitch:53->InstallSnapshotFromLeaderTests.testInstallSnapshotLeaderSwitch:94->InstallSnapshotFromLeaderTests.testInstallSnapshotDuringLeaderSwitch:141
» IllegalState*

This is the same ParameterizedTest with separateHeartbeat = true.
This IllegalStateException was for "No leader yet".  The test blocked the
peers so they could not vote for each other.  They kept getting 0 responses
as shown below.
This case is a test problem.

2024-06-17 00:13:26,950 [s2@group-F9E5FA170112-LeaderElection6] INFO
 impl.LeaderElection (LeaderElection.java:logAndReturn(89)) -
s2@group-F9E5FA170112-LeaderElection6: PRE_VOTE TIMEOUT received 0
response(s) and 0 exception(s):
2024-06-17 00:13:26,978 [s1@group-F9E5FA170112-LeaderElection7] INFO
 impl.LeaderElection (LeaderElection.java:logAndReturn(89)) -
s1@group-F9E5FA170112-LeaderElection7: PRE_VOTE TIMEOUT received 0
response(s) and 0 exception(s):


- [*ERROR*] *  
TestLeaderInstallSnapshot.testSeparateSnapshotInstallPath(Boolean)[1]
» Timeout*

- [*ERROR*] *  
TestLeaderInstallSnapshot.testSeparateSnapshotInstallPath(Boolean)[2]
» Timeout*


The test started with only 1 server. Both test cases can pass after
changing to 3 servers.

- [*ERROR*] *  TestRetryCacheWithNettyRpc.testRetryOnNewLeader » Timeout
testRetryOnNewLeader...*

The timeout was because setConf kept failing.  The problem was that the
setConf tried to remove 2 peers and add 2 peers at the same time.  When
changing the setConf to either (1) remove 1 peer and add 1 peer, or (2)
remove 2 peers and add 0 peer, the test can pass.

Tsz-Wo


On Sat, Jun 15, 2024 at 8:54 AM Tsz Wo Sze <[email protected]> wrote:

> Hi William,
>
> I have been running tests for the past few days.  Unfortunately, I got
> similar failures in rc1 as we found in rc0.  I will dig deeper to see why
> these tests are failing.
>
> [*ERROR*] *Failures: *
>
> [*ERROR*] *
> TestLeaderInstallSnapshot.testInstallSnapshotLeaderSwitch:53->InstallSnapshotFromLeaderTests.testInstallSnapshotLeaderSwitch:94->InstallSnapshotFromLeaderTests.testInstallSnapshotDuringLeaderSwitch:170
> Unexpected exception type thrown, expected:
> <org.apache.ratis.protocol.exceptions.RaftRetryFailureException> but was:
> <org.apache.ratis.protocol.exceptions.ReconfigurationTimeoutException>*
>
> [*ERROR*] *Errors: *
>
> [*ERROR*] *
> TestLeaderInstallSnapshot.testInstallSnapshotLeaderSwitch:53->InstallSnapshotFromLeaderTests.testInstallSnapshotLeaderSwitch:94->InstallSnapshotFromLeaderTests.testInstallSnapshotDuringLeaderSwitch:141
> » IllegalState*
>
> [*ERROR*] *
> TestLeaderInstallSnapshot.testSeparateSnapshotInstallPath(Boolean)[1] »
> Timeout*
>
> [*ERROR*] *
> TestLeaderInstallSnapshot.testSeparateSnapshotInstallPath(Boolean)[2] »
> Timeout*
>
> [*ERROR*] *  TestRetryCacheWithNettyRpc.testRetryOnNewLeader » Timeout
> testRetryOnNewLeader...*
>
> [*INFO*]
>
> [*ERROR*] *Tests run: 328, Failures: 1, Errors: 4, Skipped: 0*
>
> Tsz-Wo
>
>
> On Wed, Jun 12, 2024 at 2:48 PM William Song <[email protected]> wrote:
>
>> Hi Community,
>>
>> I’m calling a vote For Apache Ratis Release 3.1.0 rc1.
>>
>> The git tag to be vote upon:
>> https://github.com/apache/ratis/tree/ratis-3.1.0-rc1
>>
>> The git commit hash:
>> 9ed4e3eca792d96aafa4f43ba5dfe1b9650a522c
>>
>> The source and binary tarballs can be found at:
>> https://dist.apache.org/repos/dist/dev/ratis/3.1.0/rc1
>>
>> Fingerprint of the GPG key release artifacts are signed with:
>> DCE2 C33D 41C6 2578 969D BAFE 37D6 ECF8 4E78 BC92
>>
>> My public key to verify signatures can be found in:
>> https://dist.apache.org/repos/dist/dev/ratis/KEYS
>>
>> Maven artifacts are staged at:
>> https://repository.apache.org/content/repositories/orgapacheratis-1146
>>
>> This vote will remain open for at least 72 hours.
>> Please vote on releasing this ratis-3.1.0-rc1. Thanks in advance.
>>
>> [ ] +1 approve
>> [ ] 0 no opinion
>> [ ] -1 disapprove (and reason why)
>>
>> Starting with my +1(binding)
>> - Verified checksums, signatures and git hash.
>> - Checked LICENSE and NOTICE.
>> - Compared the files in src tarball with the files at the given git tag.
>> - Built from source.
>> - Ran regular Ratis CI. [1]
>>
>> [1] https://github.com/apache/ratis/actions/runs/9477559508
>
>

Reply via email to