[
https://issues.apache.org/jira/browse/HDFS-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010318#comment-15010318
]
Rakesh R commented on HDFS-9435:
--------------------------------
Thanks [~iwasakims] for the interest and useful comments.
I could see, again {{#triggerBlockReportForTests}} can immediately return
before acknowledging the ActiveNN. Below is the sequence:
1=> During startUp(), it will call
{{dn.getAllBpOs().get(0).triggerBlockReportForTests()}} and initializes final
long {{oldBlockReportTime = scheduler.nextBlockReportTime;}}
2=> BPServiceActor#start().
3=> Starting of the actor thread will call the function
BPServiceActor#connectToNNAndHandshake()
4=> BPServiceActor#register()
5=> scheduler#scheduleBlockReport(dnConf.initialBlockReportDelayMs);
Now, {{#scheduleBlockReport}} function call will update {{nextBlockReportTime =
monotonicNow();}}. This will again stops waiting period of
{{#triggerBlockReportForTests}} and continue to the unit test cases, then fall
into similar error situation.
IMHO like you mentioned, two times {{#triggerBlockReportForTests}} will make
the tests more consistent. I'm attaching a patch showing the changes, please
review the patch again. Thanks!
> TestBlockRecovery#testRBWReplicas is failing intermittently
> -----------------------------------------------------------
>
> Key: HDFS-9435
> URL: https://issues.apache.org/jira/browse/HDFS-9435
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Rakesh R
> Assignee: Rakesh R
> Attachments: HDFS-9435-00.patch, HDFS-9435-01.patch,
> testRBWReplicas.log
>
>
> TestBlockRecovery#testRBWReplicas is failing in the [build
> 13536|https://builds.apache.org/job/PreCommit-HDFS-Build/13536/testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockRecovery/testRBWReplicas/].
> It looks like bug in tests due to race condition.
> Note: Attached logs taken from the build to this jira.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)