[
https://issues.apache.org/jira/browse/HDDS-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363598#comment-17363598
]
Attila Doroszlai commented on HDDS-5345:
----------------------------------------
bq. (One additional change performed is irrespective of SCM HA is to wait for
at least one healthy pipeline if DN's are configured is >=3. So that writes
will succeed, after safemode exit).
I guess this change uncovered the bug, the root cause of which I think is
HDDS-5348.
bq. This we follow in Docker tests, I have thought it would be better to bring
that to MiniOzoneCluster also.
One difference is that docker-based tests are configured for more frequent
pipeline creation. Most integration tests use the default 2-minute pipeline
creation interval. Since this matches timeout for safe mode exit, we
intermittently hit this timeout due to pipeline not getting created.
bq. Could you share some links/logs related to this?
Items in the description are paths in the [build results
repo|https://github.com/elek/ozone-build-results/]. Direct links:
https://github.com/elek/ozone-build-results/blob/master/2021/06/11/8401/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/11/8408/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithACL.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/14/8434/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
> Intermittent timeout in TestOzoneManagerHA.init
> -----------------------------------------------
>
> Key: HDDS-5345
> URL: https://issues.apache.org/jira/browse/HDDS-5345
> Project: Apache Ozone
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Attila Doroszlai
> Priority: Critical
>
> {{TestOzoneManagerHA*}} intermittently fails to start the mini cluster,
> probably since
> {code}
> HDDS-5263. SCM may stay in safe mode forever after a unclean shutdown of SCM.
> (#2294)
> {code}
> {noformat}
> 2021/06/11/8401/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
> 2021/06/11/8408/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
> 2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly.txt
> 2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithACL.txt
> 2021/06/14/8434/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
> {noformat}
> CC [~bharat]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]