[ 
https://issues.apache.org/jira/browse/HDFS-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018639#comment-15018639
 ] 

Wei-Chiu Chuang commented on HDFS-6101:
---------------------------------------

Thanks for the comments and reviews.
I was hesitate to continue on it because on my local machine, this test still 
frequently failed after I made the change suggested by [~walter.k.su].

The only way I could avoid failures is to reduce the number of concurrent 
writers. Originally the test had 10 writers, and it fails 1 in 4 times. I 
reduced the number to 5, and it did not fail for 100 runs. This is why I 
suspect there is another issue with the default block placement policy 
(HDFS-9361 Default block placement policy causes TestReplaceDataNodeOnFailure 
to fail intermittently) When there are 10 writers begin to writer at the same 
time, the policy will not allow some writers set up pipelines with 3 data 
nodes, due to the load factor of data nodes. When the writers reduces, the load 
reduces and therefore the test passed.


> TestReplaceDatanodeOnFailure fails occasionally
> -----------------------------------------------
>
>                 Key: HDFS-6101
>                 URL: https://issues.apache.org/jira/browse/HDFS-6101
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Arpit Agarwal
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-6101.001.patch, HDFS-6101.002.patch, 
> HDFS-6101.003.patch, HDFS-6101.004.patch, HDFS-6101.005.patch, 
> TestReplaceDatanodeOnFailure.log
>
>
> Exception details in a comment below.
> The failure repros on both OS X and Linux if I run the test ~10 times in a 
> loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to