[jira] [Work logged] (HDFS-15716) TestUpgradeDomainBlockPlacementPolicy flaky

ASF GitHub Bot (Jira) Mon, 14 Dec 2020 14:00:37 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-15716?focusedWorklogId=524136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524136
 ]


ASF GitHub Bot logged work on HDFS-15716:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Dec/20 21:59
            Start Date: 14/Dec/20 21:59
    Worklog Time Spent: 10m 
      Work Description: amahussein commented on pull request #2528:
URL: https://github.com/apache/hadoop/pull/2528#issuecomment-744736690


   > The list of failed unit tests in the last few days is getting worse and 
worse.
   > @amahussein, you've been making lots of fixes in the last month; any idea 
why is this suddenly getting so bad?
   
   Thanks @goiri. 
   I took a look at   the  build latest 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/355/
   
   ```bash
   Test Result (23 failures / -45)
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithStripedFile
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithIncludeListWithPorts
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithSortTopNodes
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerCliWithIncludeListWithPorts
   
org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks.testSetRepIncWithUnderReplicatedBlocks
   org.apache.hadoop.hdfs.server.namenode.TestFsck.testECFsck
   
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeDecommision
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testUpdateAppStateXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppQueueXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppTimeoutsXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testGetContainersXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testUpdateAppPriorityXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testUpdateAppQueueXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppTimeoutXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppAttemptXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testGetAppAttemptXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppStateXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppPriorityXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testGetAppsMultiThread
   
org.apache.hadoop.tools.dynamometer.TestDynamometerInfra.org.apache.hadoop.tools.dynamometer.TestDynamometerInfra
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
   
   ```
   
   - TestFsck: could be failing after the installation of intel-ISA.
   - TestUnderReplicatedBlocks: I remember  I saw that unit test failing before.
   - TestBalancer: Interesting that there are several failures.  I haven't 
looked into that yet. I guess there is a race condition somewhere in the code 
path.
   - TestRouterWebServicesREST, TestDynamometerInfra, TestDistributedShell: are 
failing for sometime now.
   
   By the way, I found that TestDistributedShell does not clean at all. The 
problem that the two failing unit tests leave several processes running for 
sometime. It could be one of the reasons the system crashes as the background 
containers are sucking memory and CPU resources.
   I am going to address that sometime soon. Hopefully this will enhance the 
stability of the overall Yetus execution.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 524136)
    Time Spent: 1h 50m  (was: 1h 40m)

> TestUpgradeDomainBlockPlacementPolicy flaky
> -------------------------------------------
>
>                 Key: HDFS-15716
>                 URL: https://issues.apache.org/jira/browse/HDFS-15716
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode, test
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In some slow runs {{TestUpgradeDomainBlockPlacementPolicy#testPlacement}} and 
> {{TestUpgradeDomainBlockPlacementPolicy#testPlacementAfterDecommission}} fail.
> On branch-2.10, this was fixed by waiting for the replication to be complete. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-15716) TestUpgradeDomainBlockPlacementPolicy flaky

Reply via email to