[
https://issues.apache.org/jira/browse/HDFS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Foley updated HDFS-1806:
-----------------------------
Attachment: blockReport_08_failure_log.html
In the attached log excerpt, from Apache Hudson/Jenkins QA auto-test,
replication starts at the datanode at 2:58:09,608.
The datanode moves the replica from tmp to finalized at 2:58:09,659-660.
Then at 2:58:09,663 we see the message "Replication state before the loop 0",
which represents the START of polling -- way too late.
So both the waitTil(100) and waitTil(50) lines in waitForTempReplica() are too
long.
> TestBlockReport.blockReport_08() and _09() are timing-dependent and likely to
> fail on fast servers
> --------------------------------------------------------------------------------------------------
>
> Key: HDFS-1806
> URL: https://issues.apache.org/jira/browse/HDFS-1806
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node, name-node
> Affects Versions: 0.22.0
> Reporter: Matt Foley
> Attachments: blockReport_08_failure_log.html
>
>
> Method waitForTempReplica() polls every 100ms during block replication,
> attempting to "catch" a datanode in the state of having a TEMPORARY replica.
> But examination of a current Hudson test failure log shows that the replica
> goes from "start" to "TEMPORARY" to "FINALIZED" in only 50ms, so of course
> the poll usually misses it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira