[
https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101065#comment-15101065
]
Kai Zheng commented on HDFS-9646:
---------------------------------
Using {{random.nextInt}} to generate the datanode dead indexes is good for the
fix here. On the other hand, The randomness may still miss some boundary cases
some time but not on other chances, thus the test may be successful in most
time but fail in other time. To help troubleshooting in case the test fails in
future, suggest the patch output the dead datanode list.
> ErasureCodingWorker may fail when recovering data blocks with length less
> than the first internal block
> -------------------------------------------------------------------------------------------------------
>
> Key: HDFS-9646
> URL: https://issues.apache.org/jira/browse/HDFS-9646
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding
> Affects Versions: 3.0.0
> Reporter: Takuya Fukudome
> Assignee: Jing Zhao
> Priority: Critical
> Attachments: HDFS-9646.000.patch, HDFS-9646.001.patch,
> test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the
> following exception when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN datanode.DataNode
> (ErasureCodingWorker.java:run(467)) - Failed to recover striped block:
> BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)