[ 
https://issues.apache.org/jira/browse/HDDS-1332?focusedWorklogId=223193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-223193
 ]

ASF GitHub Bot logged work on HDDS-1332:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Apr/19 18:44
            Start Date: 04/Apr/19 18:44
    Worklog Time Spent: 10m 
      Work Description: adoroszlai commented on pull request #697: [HDDS-1332] 
Attempt to fix flaky test testStartStopDatanodeStateMachine
URL: https://github.com/apache/hadoop/pull/697
 
 
   ## What changes were proposed in this pull request?
   
   `testStartStopDatanodeStateMachine` is flaky, causing [occasional pre-commit 
build 
failures](https://builds.apache.org/job/hadoop-multibranch/job/PR-691/1/artifact/out/patch-unit-hadoop-hdds_container-service.txt).
  [HDDS-1332](https://issues.apache.org/jira/browse/HDDS-1332) added some 
logging to find out more about the cause.
   
   I think the problem is not test-specific, and is caused by the following: 
`SCMConnectionManager#scmMachines` is a plain `HashMap`, guarded by a 
`ReadWriteLock` in most places where it's used, except `getValues()`.  The 
method also returns the values collection without any write protection (though 
currently none of the callers modify it).
   
   This is an attempt to fix the cause by acquiring the read lock and creating 
a read-only copy.
   
   https://issues.apache.org/jira/browse/HDDS-1332
   
   ## How was this patch tested?
   
   Ran affected unit tests several times, plus tried `ozone` docker-compose 
cluster.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 223193)
    Time Spent: 40m  (was: 0.5h)

> Add some logging for flaky test testStartStopDatanodeStateMachine
> -----------------------------------------------------------------
>
>                 Key: HDDS-1332
>                 URL: https://issues.apache.org/jira/browse/HDDS-1332
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.5.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> testStartStopDatanodeStateMachine fails frequently in Jenkins. It also seems 
> to have a timing issue which may be different from the Jenkins failure.
> E.g. If I add a 10 second sleep as below I can get the test to fail 100%.
> {code}
> @@ -163,6 +163,7 @@ public void testStartStopDatanodeStateMachine() throws 
> IOException,
>      try (DatanodeStateMachine stateMachine =
>          new DatanodeStateMachine(getNewDatanodeDetails(), conf, null)) {
>        stateMachine.startDaemon();
> +      Thread.sleep(10_000L);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to