[jira] [Updated] (HDDS-5971) TestHDDSUpgrade hitting maven global test timeout

Siyao Meng (Jira) Wed, 10 Nov 2021 19:51:08 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-5971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siyao Meng updated HDDS-5971:
-----------------------------
    Description: 
{{TestHDDSUpgrade}} is frequently hitting maven global test timeout threshold 
(about 1 hr), causing {{integration (filesystem-hdds)}} to fail. The class's 
junit timeout is set to 11000000ms (3 hrs+).

I've seen this at least 3 times recently for new PR CI runs. Need to 
investigate why some test cases can become stuck for so long. I ran the test 
class locally with IntelliJ and it finished in 5 min 55 sec:

 !screenshot-1.jpg! 

CC [~avijayan] [~erose]

Failing run:

https://github.com/apache/ozone/runs/4160837361

Found this I the above run's artifact bundle:  {{No healthy node found to 
allocate container}} ?

{code:title=org.apache.hadoop.hdds.upgrade.TestHDDSUpgrade-output.txt}
2021-11-10 04:46:13,552 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
(SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one 
open pipeline after SCM finalization.
2021-11-10 04:46:18,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
(SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one 
open pipeline after SCM finalization.
2021-11-10 04:46:18,569 [RatisPipelineUtilsThread - 0] ERROR 
scm.SCMCommonPlacementPolicy 
(SCMCommonPlacementPolicy.java:filterNodesWithSpace(171)) - Unable to find 
enough nodes that meet the space requirement of 1073741824 bytes for metadata 
and 5368709120 bytes for data in healthy node set. Required 3. Found 2.
2021-11-10 04:46:23,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
(SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one 
open pipeline after SCM finalization.
2021-11-10 04:46:24,033 [ReplicationMonitor] ERROR scm.SCMCommonPlacementPolicy 
(SCMCommonPlacementPolicy.java:chooseDatanodes(140)) - No healthy node found to 
allocate container.
2021-11-10 04:46:24,033 [ReplicationMonitor] WARN  container.ReplicationManager 
(ReplicationManager.java:handleUnderReplicatedContainer(1199)) - Exception 
while replicating container 2.
org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
allocate container.
        at 
org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:141)
        at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodes(SCMContainerPlacementRandom.java:78)
        at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:1163)
        at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:519)
        at java.util.ArrayList.forEach(ArrayList.java:1259)
        at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.processAll(ReplicationManager.java:369)
        at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:383)
        at java.lang.Thread.run(Thread.java:748)
2021-11-10 04:46:24,033 [ReplicationMonitor] INFO  container.ReplicationManager 
(ReplicationManager.java:processAll(371)) - Replication Monitor Thread took 3 
milliseconds for processing 2 containers.
2021-11-10 04:46:28,554 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
(SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one 
open pipeline after SCM finalization.
2021-11-10 04:46:33,556 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
(SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least one 
open pipeline after SCM finalization.

{code}

  was:
{{TestHDDSUpgrade}} is frequently hitting maven global test timeout threshold 
(about 1 hr), causing {{integration (filesystem-hdds)}} to fail. The class's 
junit timeout is set to 11000000ms (3 hrs+).

I've seen this at least 3 times recently for new PR CI runs. Need to 
investigate why some test cases can become stuck for so long. I ran the test 
class locally with IntelliJ and it finished in 5 min 55 sec:

 !screenshot-1.jpg! 

CC [~avijayan] [~erose]

Failing run(s):

https://github.com/apache/ozone/runs/4160837361


> TestHDDSUpgrade hitting maven global test timeout
> -------------------------------------------------
>
>                 Key: HDDS-5971
>                 URL: https://issues.apache.org/jira/browse/HDDS-5971
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Siyao Meng
>            Priority: Major
>         Attachments: screenshot-1.jpg
>
>
> {{TestHDDSUpgrade}} is frequently hitting maven global test timeout threshold 
> (about 1 hr), causing {{integration (filesystem-hdds)}} to fail. The class's 
> junit timeout is set to 11000000ms (3 hrs+).
> I've seen this at least 3 times recently for new PR CI runs. Need to 
> investigate why some test cases can become stuck for so long. I ran the test 
> class locally with IntelliJ and it finished in 5 min 55 sec:
>  !screenshot-1.jpg! 
> CC [~avijayan] [~erose]
> Failing run:
> https://github.com/apache/ozone/runs/4160837361
> Found this I the above run's artifact bundle:  {{No healthy node found to 
> allocate container}} ?
> {code:title=org.apache.hadoop.hdds.upgrade.TestHDDSUpgrade-output.txt}
> 2021-11-10 04:46:13,552 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:18,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:18,569 [RatisPipelineUtilsThread - 0] ERROR 
> scm.SCMCommonPlacementPolicy 
> (SCMCommonPlacementPolicy.java:filterNodesWithSpace(171)) - Unable to find 
> enough nodes that meet the space requirement of 1073741824 bytes for metadata 
> and 5368709120 bytes for data in healthy node set. Required 3. Found 2.
> 2021-11-10 04:46:23,553 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:24,033 [ReplicationMonitor] ERROR 
> scm.SCMCommonPlacementPolicy 
> (SCMCommonPlacementPolicy.java:chooseDatanodes(140)) - No healthy node found 
> to allocate container.
> 2021-11-10 04:46:24,033 [ReplicationMonitor] WARN  
> container.ReplicationManager 
> (ReplicationManager.java:handleUnderReplicatedContainer(1199)) - Exception 
> while replicating container 2.
> org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to 
> allocate container.
>       at 
> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:141)
>       at 
> org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodes(SCMContainerPlacementRandom.java:78)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:1163)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:519)
>       at java.util.ArrayList.forEach(ArrayList.java:1259)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.processAll(ReplicationManager.java:369)
>       at 
> org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:383)
>       at java.lang.Thread.run(Thread.java:748)
> 2021-11-10 04:46:24,033 [ReplicationMonitor] INFO  
> container.ReplicationManager (ReplicationManager.java:processAll(371)) - 
> Replication Monitor Thread took 3 milliseconds for processing 2 containers.
> 2021-11-10 04:46:28,554 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> 2021-11-10 04:46:33,556 [Time-limited test] INFO  upgrade.UpgradeFinalizer 
> (SCMUpgradeFinalizer.java:postFinalizeUpgrade(115)) - Waiting for at least 
> one open pipeline after SCM finalization.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-5971) TestHDDSUpgrade hitting maven global test timeout

Reply via email to