[ 
https://issues.apache.org/jira/browse/HDDS-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363598#comment-17363598
 ] 

Attila Doroszlai commented on HDDS-5345:
----------------------------------------

bq. (One additional change performed is irrespective of SCM HA is to wait for 
at least one healthy pipeline if DN's are configured is >=3. So that writes 
will succeed, after safemode exit).

I guess this change uncovered the bug, the root cause of which I think is 
HDDS-5348.

bq. This we follow in Docker tests, I have thought it would be better to bring 
that to MiniOzoneCluster also.

One difference is that docker-based tests are configured for more frequent 
pipeline creation.  Most integration tests use the default 2-minute pipeline 
creation interval.  Since this matches timeout for safe mode exit, we 
intermittently hit this timeout due to pipeline not getting created.

bq. Could you share some links/logs related to this?

Items in the description are paths in the [build results 
repo|https://github.com/elek/ozone-build-results/].  Direct links:

https://github.com/elek/ozone-build-results/blob/master/2021/06/11/8401/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/11/8408/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithACL.txt
https://github.com/elek/ozone-build-results/blob/master/2021/06/14/8434/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt

> Intermittent timeout in TestOzoneManagerHA.init
> -----------------------------------------------
>
>                 Key: HDDS-5345
>                 URL: https://issues.apache.org/jira/browse/HDDS-5345
>             Project: Apache Ozone
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Attila Doroszlai
>            Priority: Critical
>
> {{TestOzoneManagerHA*}} intermittently fails to start the mini cluster, 
> probably since 
> {code}
> HDDS-5263. SCM may stay in safe mode forever after a unclean shutdown of SCM. 
> (#2294)
> {code}
> {noformat}
> 2021/06/11/8401/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
> 2021/06/11/8408/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
> 2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly.txt
> 2021/06/14/8429/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithACL.txt
> 2021/06/14/8434/it-ozone/hadoop-ozone/integration-test/org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData.txt
> {noformat}
> CC [~bharat]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to