[
https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097523#comment-15097523
]
Kai Zheng commented on HDFS-9646:
---------------------------------
The patch is a great fix along with good refactorings. Some comments:
1. It's good to refactor and avoid duplicate codes and computing around
{{getReadLength}}. Minors: 1) {{positionInBlock}} would be good to be
explicitly initialized to 0 in the beginning of the {{run}} method; 2)
{{toRecover}} better to use the original name {{toRecoverLen}}; 3) {{success}}
could be {{successList}}.
2. In the test, introducing {{RecoveryType}} is nice. Suggest: change {{Any}}
to {{Both}}, and the logic for it can be, generate dead blocks of both data
ones and parity ones, thus the test would be much thorough. A minor: {{toDead}}
could be {{toDie}}.
3. Question: do we need new test codes to expose the issue and ensure the issue
is fixed? I'm not sure about this, because existing tests have already all
sorts of file lengths, maybe lacking the right one for the reported case as you
described above (the max length of the targeted blocks should be smaller than
the first block).
> ErasureCodingWorker may fail when recovering data blocks with length less
> than the first internal block
> -------------------------------------------------------------------------------------------------------
>
> Key: HDFS-9646
> URL: https://issues.apache.org/jira/browse/HDFS-9646
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding
> Affects Versions: 3.0.0
> Reporter: Takuya Fukudome
> Assignee: Jing Zhao
> Priority: Critical
> Attachments: HDFS-9646.000.patch, test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the
> following exception when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN datanode.DataNode
> (ErasureCodingWorker.java:run(467)) - Failed to recover striped block:
> BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)