[
https://issues.apache.org/jira/browse/HDDS-1332?focusedWorklogId=223193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-223193
]
ASF GitHub Bot logged work on HDDS-1332:
----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Apr/19 18:44
Start Date: 04/Apr/19 18:44
Worklog Time Spent: 10m
Work Description: adoroszlai commented on pull request #697: [HDDS-1332]
Attempt to fix flaky test testStartStopDatanodeStateMachine
URL: https://github.com/apache/hadoop/pull/697
## What changes were proposed in this pull request?
`testStartStopDatanodeStateMachine` is flaky, causing [occasional pre-commit
build
failures](https://builds.apache.org/job/hadoop-multibranch/job/PR-691/1/artifact/out/patch-unit-hadoop-hdds_container-service.txt).
[HDDS-1332](https://issues.apache.org/jira/browse/HDDS-1332) added some
logging to find out more about the cause.
I think the problem is not test-specific, and is caused by the following:
`SCMConnectionManager#scmMachines` is a plain `HashMap`, guarded by a
`ReadWriteLock` in most places where it's used, except `getValues()`. The
method also returns the values collection without any write protection (though
currently none of the callers modify it).
This is an attempt to fix the cause by acquiring the read lock and creating
a read-only copy.
https://issues.apache.org/jira/browse/HDDS-1332
## How was this patch tested?
Ran affected unit tests several times, plus tried `ozone` docker-compose
cluster.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 223193)
Time Spent: 40m (was: 0.5h)
> Add some logging for flaky test testStartStopDatanodeStateMachine
> -----------------------------------------------------------------
>
> Key: HDDS-1332
> URL: https://issues.apache.org/jira/browse/HDDS-1332
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.5.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> testStartStopDatanodeStateMachine fails frequently in Jenkins. It also seems
> to have a timing issue which may be different from the Jenkins failure.
> E.g. If I add a 10 second sleep as below I can get the test to fail 100%.
> {code}
> @@ -163,6 +163,7 @@ public void testStartStopDatanodeStateMachine() throws
> IOException,
> try (DatanodeStateMachine stateMachine =
> new DatanodeStateMachine(getNewDatanodeDetails(), conf, null)) {
> stateMachine.startDaemon();
> + Thread.sleep(10_000L);
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]