[ 
https://issues.apache.org/jira/browse/HDDS-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444189#comment-17444189
 ] 

Ethan Rose commented on HDDS-5971:
----------------------------------

Thanks for reporting this [~smeng]. This will be difficult to track down 
without logs. Next time you see this failure can you attach the log bundle to 
this Jira? The logs on the linked run have already expired.

> TestHDDSUpgrade hitting maven global test timeout
> -------------------------------------------------
>
>                 Key: HDDS-5971
>                 URL: https://issues.apache.org/jira/browse/HDDS-5971
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Siyao Meng
>            Priority: Major
>         Attachments: screenshot-1.jpg
>
>
> {{TestHDDSUpgrade}} is frequently hitting maven global test timeout threshold 
> (about 1 hr), causing {{integration (filesystem-hdds)}} to fail. The class's 
> junit timeout is set to 11000000ms (3 hrs+).
> I've seen this at least 3 times recently for new PR CI runs. Need to 
> investigate why some test cases can become stuck for so long. I ran the test 
> class locally with IntelliJ and it finished in 5 min 55 sec:
>  !screenshot-1.jpg! 
> CC [~avijayan] [~erose]
> Failing run:
> https://github.com/apache/ozone/runs/4160837361
> Found this I the above run's artifact bundle:  {{No healthy node found to 
> allocate container}} ?
> {code:title=org.apache.hadoop.hdds.upgrade.TestHDDSUpgrade-output.txt}
> 2021-11-10 04:46:13,552 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:18,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:18,569 [RatisPipelineUtilsThread - 0] ERROR 
> scm.SCMCommonPlacementPolicy 
> (SCMCommonPlacementPolicy.java:filterNodesWithSpace(171)) - Unable to find 
> enough nodes that meet the space requirement of 1073741824 bytes for metadata 
> and 5368709120 bytes for data in healthy node set. Required 3. Found 2.
> 2021-11-10 04:46:23,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:24,033 [ReplicationMonitor] ERROR 
> scm.SCMCommonPlacementPolicy 
> (SCMCommonPlacementPolicy.java:chooseDatanodes(140)) - No healthy node found 
> to allocate container.
> 2021-11-10 04:46:24,033 [ReplicationMonitor] WARN  
> container.ReplicationManager 
> (ReplicationManager.java:handleUnderReplicatedContainer(1199)) - Exception 
> while replicating container 2.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
> allocate container.
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:141)
>       at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodes(SCMContainerPlacementRandom.java:78)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:1163)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:519)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processAll(ReplicationManager.java:369)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:383)
>       at java.lang.Thread.run(Thread.java:748)
> 2021-11-10 04:46:24,033 [ReplicationMonitor] INFO  
> container.ReplicationManager (ReplicationManager.java:processAll(371)) - 
> Replication Monitor Thread took 3 milliseconds for processing 2 containers.
> 2021-11-10 04:46:28,554 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:33,556 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to