[ 
https://issues.apache.org/jira/browse/HDFS-10748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423791#comment-15423791
 ] 

Yiqun Lin commented on HDFS-10748:
----------------------------------

Thanks [~xyao] for reporting this issue.
It seemed HDFS-7886 was not completely fix this issue. Can see the comment in 
HDFS-7930(https://issues.apache.org/jira/browse/HDFS-7930?focusedCommentId=14368053&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14368053).
{quote}
Although this will not fix the testTruncateWithDataNodesRestart() completely. 
The location is correctly invalidated on the NN, but then NN postpones 
invalidation on the DN and waits for the next report.
...
If I add triggerBlockReports() before waitReplication() then the test passes, 
as it finally triggers deletion of the replica on the DN.
{quote}
I think the main problem is that the block report is not completely sended to 
the namenode, then lead the cluster wait for the replication.

I tested {{testTruncateWithDataNodesRestart}} in my local env, it will fails 
one time when I runs that test 3~5 times. But when I try the way as the comment 
mentioned, the result are all passed. I think the operation 
{{triggerBlockReports()}} would be make sense to this jira.

Attach a simple patch for this.

> TestFileTruncate#testTruncateWithDataNodesRestart runs sometimes timeout
> ------------------------------------------------------------------------
>
>                 Key: HDFS-10748
>                 URL: https://issues.apache.org/jira/browse/HDFS-10748
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>            Reporter: Xiaoyu Yao
>
> This was fixed by HDFS-7886. But some recent [Jenkins 
> Results|https://builds.apache.org/job/PreCommit-HDFS-Build/16390/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
>  started seeing this again: 
> {code}
> Tests run: 18, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 172.025 sec 
> <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
> testTruncateWithDataNodesRestart(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate)
>   Time elapsed: 43.861 sec  <<< ERROR!
> java.util.concurrent.TimeoutException: Timed out waiting for 
> /test/testTruncateWithDataNodesRestart to reach 3 replicas
>       at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:751)
>       at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestart(TestFileTruncate.java:704)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to