[ 
https://issues.apache.org/jira/browse/HDDS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414971#comment-17414971
 ] 

Attila Doroszlai commented on HDDS-5631:
----------------------------------------

I think this is a test bug: {{testInstallOldCheckpointFailure}} updates state 
machine term/index from another thread, not the {{*-StateMachineUpdater}} one:

{code}
2021-09-14 12:29:25,704 [Listener at 127.0.0.1/11932] INFO  
ha.SCMHADBTransactionBufferImpl 
(SCMHADBTransactionBufferImpl.java:updateLatestTrxInfo(81)) - ZZZ Update latest 
tx info: 2#25 -> 2#128
...
2021-09-14 12:29:25,706 
[949a75fa-0f39-4afc-b038-8df849b8be69@group-D0B2DE51B071-StateMachineUpdater] 
ERROR impl.StateMachineUpdater (StateMachineUpdater.java:run(191)) - 
949a75fa-0f39-4afc-b038-8df849b8be69@group-D0B2DE51B071-StateMachineUpdater 
caught a Throwable.
java.lang.IllegalStateException: 949a75fa-0f39-4afc-b038-8df849b8be69: Failed 
updateLastAppliedTermIndex: newTI = (t:2, i:26) < oldTI = (t:2, i:128)
{code}

In the above case this resulted in a different test error:

{code}
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 169.502 s <<< 
FAILURE! - in org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA
testInstallOldCheckpointFailure  Time elapsed: 49.722 s  <<< ERROR!
java.lang.IllegalThreadStateException
        at java.lang.Thread.start(Thread.java:708)
        at org.apache.hadoop.ipc.Server.start(Server.java:3396)
        at 
org.apache.hadoop.hdds.scm.server.SCMDatanodeProtocolServer.start(SCMDatanodeProtocolServer.java:185)
        at 
org.apache.hadoop.hdds.scm.ha.SCMStateMachine.notifyTermIndexUpdated(SCMStateMachine.java:333)
        at 
org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.testInstallOldCheckpointFailure(TestSCMInstallSnapshotWithHA.java:182)
{code}

* 
https://github.com/elek/ozone-build-results/blob/master/2021/07/06/8773/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
* 
https://github.com/elek/ozone-build-results/blob/master/2021/08/16/9678/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt

Another possible result is:

{code}
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 175.451 s <<< 
FAILURE! - in org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA
testInstallOldCheckpointFailure  Time elapsed: 50.014 s  <<< FAILURE!
java.lang.AssertionError: expected:<(t:2, i:128)> but was:<(t:2, i:28)>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:120)
        at org.junit.Assert.assertEquals(Assert.java:146)
        at 
org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.testInstallOldCheckpointFailure(TestSCMInstallSnapshotWithHA.java:207)
{code}

* 
https://github.com/elek/ozone-build-results/blob/master/2021/07/03/8750/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
* 
https://github.com/elek/ozone-build-results/blob/master/2021/07/07/8781/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
* 
https://github.com/elek/ozone-build-results/blob/master/2021/07/15/8964/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
* 
https://github.com/elek/ozone-build-results/blob/master/2021/08/27/10014/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt
* 
https://github.com/elek/ozone-build-results/blob/master/2021/09/05/10120/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.scm.TestSCMInstallSnapshotWithHA.txt

> Intermittent test failure in TestSCMInstallSnapshotWithHA due to System.exit 
> call
> ---------------------------------------------------------------------------------
>
>                 Key: HDDS-5631
>                 URL: https://issues.apache.org/jira/browse/HDDS-5631
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Attila Doroszlai
>            Priority: Blocker
>         Attachments: it-ozone.zip
>
>
> Failed run: 
> [https://github.com/apache/ozone/pull/2549/checks?check_run_id=3355822776]
>  
> Error from the attached log bundle:
> {code}
> 2021-08-18 01:11:25,849 
> [9151209e-0687-4e1f-91d3-18f4f59985d6@group-24D12FE5568F-StateMachineUpdater] 
> ERROR statemachine.StateMachine (ExitUtils.java:terminate(133)) - Terminating 
> with exit status 1: Updating DB buffer transaction info by an older 
> transaction info, current: 2#128, updating to: 2#292021-08-18 01:11:25,849 
> [9151209e-0687-4e1f-91d3-18f4f59985d6@group-24D12FE5568F-StateMachineUpdater] 
> ERROR statemachine.StateMachine (ExitUtils.java:terminate(133)) - Terminating 
> with exit status 1: Updating DB buffer transaction info by an older 
> transaction info, current: 2#128, updating to: 
> 2#29java.lang.IllegalArgumentException: Updating DB buffer transaction info 
> by an older transaction info, current: 2#128, updating to: 2#29 at 
> org.apache.hadoop.hdds.scm.ha.SCMHADBTransactionBufferImpl.updateLatestTrxInfo(SCMHADBTransactionBufferImpl.java:71)
>  at 
> org.apache.hadoop.hdds.scm.ha.SCMStateMachine.applyTransaction(SCMStateMachine.java:146)
>  at 
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1691)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:234)
>  at 
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:179)
>  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to