[ 
https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099261#comment-15099261
 ] 

Jing Zhao commented on HDFS-9646:
---------------------------------

The bug can also be reproduced with the change in this [comment | 
https://issues.apache.org/jira/browse/HDFS-9646?focusedCommentId=15097200&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15097200],
 and the current {{generateDeadDnIndices}} can cover these scenarios.

For HDFS-9585, if we have some false targetStatus, the original code can have 
{{arrayindexoutofboundsexception}}. So I'm not planning to include an extra 
unit test in this jira.

Currently I will focus on the failure of {{TestReadStripedFileWithDecoding}}. 
The current observation:
# TestReadStripedFileWithDecoding actually fails to disable the replication 
monitor although it sets the max stream number to 0 (because the current test 
code does not pick the configuration). 
# If we disable the replication monitor, 
{{TestReadStripedFileWithDecoding#testReadCorrectedData}} can pass in my local 
runs.

So I'm currently doubting if there is any race between a reader and the 
recovery on the DN side. Will spend more time on this.

> ErasureCodingWorker may fail when recovering data blocks with length less 
> than the first internal block
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9646
>                 URL: https://issues.apache.org/jira/browse/HDFS-9646
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0
>            Reporter: Takuya Fukudome
>            Assignee: Jing Zhao
>            Priority: Critical
>         Attachments: HDFS-9646.000.patch, HDFS-9646.001.patch, 
> test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the 
> following exception when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN  datanode.DataNode 
> (ErasureCodingWorker.java:run(467)) - Failed to recover striped block: 
> BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
>         at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to