[
https://issues.apache.org/jira/browse/HDDS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414971#comment-17414971
]
Attila Doroszlai commented on HDDS-5631:
----------------------------------------
I think this is a test bug: {{testInstallOldCheckpointFailure}} updates state
machine term/index from another thread, not the {{*-StateMachineUpdater}} one:
{code}
2021-09-14 12:29:25,704 [Listener at 127.0.0.1/11932] INFO
ha.SCMHADBTransactionBufferImpl
(SCMHADBTransactionBufferImpl.java:updateLatestTrxInfo(81)) - ZZZ Update latest
tx info: 2#25 -> 2#128
...
2021-09-14 12:29:25,706
[949a75fa-0f39-4afc-b038-8df849b8be69@group-D0B2DE51B071-StateMachineUpdater]
ERROR impl.StateMachineUpdater (StateMachineUpdater.java:run(191)) -
949a75fa-0f39-4afc-b038-8df849b8be69@group-D0B2DE51B071-StateMachineUpdater
caught a Throwable.
java.lang.IllegalStateException: 949a75fa-0f39-4afc-b038-8df849b8be69: Failed
updateLastAppliedTermIndex: newTI = (t:2, i:26) < oldTI = (t:2, i:128)
{code}
In the above case this resulted in a different test error:
{code}
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 169.502 s <<<
FAILURE! - in org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA
testInstallOldCheckpointFailure Time elapsed: 49.722 s <<< ERROR!
java.lang.IllegalThreadStateException
at java.lang.Thread.start(Thread.java:708)
at org.apache.hadoop.ipc.Server.start(Server.java:3396)
at
org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.start(SCMDatanodeProtocolServer.java:185)
at
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:333)
at
org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.testInstallOldCheckpointFailure(TestSCMInstallSnapshotWithHA.java:182)
{code}
*
https://github.com/elek/ozone-build-results/blob/master/2021/07/06/8773/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
*
https://github.com/elek/ozone-build-results/blob/master/2021/08/16/9678/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
Another possible result is:
{code}
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 175.451 s <<<
FAILURE! - in org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA
testInstallOldCheckpointFailure Time elapsed: 50.014 s <<< FAILURE!
java.lang.AssertionError: expected:<(t:2, i:128)> but was:<(t:2, i:28)>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:120)
at org.junit.Assert.assertEquals(Assert.java:146)
at
org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.testInstallOldCheckpointFailure(TestSCMInstallSnapshotWithHA.java:207)
{code}
*
https://github.com/elek/ozone-build-results/blob/master/2021/07/03/8750/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
*
https://github.com/elek/ozone-build-results/blob/master/2021/07/07/8781/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
*
https://github.com/elek/ozone-build-results/blob/master/2021/07/15/8964/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
*
https://github.com/elek/ozone-build-results/blob/master/2021/08/27/10014/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
*
https://github.com/elek/ozone-build-results/blob/master/2021/09/05/10120/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
> Intermittent test failure in TestSCMInstallSnapshotWithHA due to System.exit
> call
> ---------------------------------------------------------------------------------
>
> Key: HDDS-5631
> URL: https://issues.apache.org/jira/browse/HDDS-5631
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Attila Doroszlai
> Priority: Blocker
> Attachments: it-ozone.zip
>
>
> Failed run:
> [https://github.com/apache/ozone/pull/2549/checks?check_run_id=3355822776]
>
> Error from the attached log bundle:
> {code}
> 2021-08-18 01:11:25,849
> [9151209e-0687-4e1f-91d3-18f4f59985d6@group-24D12FE5568F-StateMachineUpdater]
> ERROR statemachine.StateMachine (ExitUtils.java:terminate(133)) - Terminating
> with exit status 1: Updating DB buffer transaction info by an older
> transaction info, current: 2#128, updating to: 2#292021-08-18 01:11:25,849
> [9151209e-0687-4e1f-91d3-18f4f59985d6@group-24D12FE5568F-StateMachineUpdater]
> ERROR statemachine.StateMachine (ExitUtils.java:terminate(133)) - Terminating
> with exit status 1: Updating DB buffer transaction info by an older
> transaction info, current: 2#128, updating to:
> 2#29java.lang.IllegalArgumentException: Updating DB buffer transaction info
> by an older transaction info, current: 2#128, updating to: 2#29 at
> org.apache.hadoop.hdds.scm.ha.SCMHADBTransactionBufferImpl.updateLatestTrxInfo(SCMHADBTransactionBufferImpl.java:71)
> at
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:146)
> at
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1691)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:234)
> at
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:179)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]