[
https://issues.apache.org/jira/browse/HDFS-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097214#comment-15097214
]
Jing Zhao edited comment on HDFS-9646 at 1/13/16 10:59 PM:
-----------------------------------------------------------
{{ErasureCodingWorker#ReconstructAndTransferBlock}} uses the length of the
first internal block to decide whether to continue the recovery work:
{code}
long firstStripedBlockLength = getBlockLen(blockGroup, 0);
while (positionInBlock < firstStripedBlockLength) {
{code}
However, if we are recovering a block whose length is less than the first one
(e.g., the last stripe like the following), we will run into an unnecessary
iteration which generates decoded result filled with 0.
| b0 | b1 | b2 | b3 | b4 | b5 | p0 | p1 | p2 |
| 64k | 64k | 64k | 64k | | | 64k | 64k | 64k |
Then at the end of {{recoverTargets}}, we set the limit of the decoding output
buffer based on the length of the block-to-be-recovered:
{code}
long blockLen = getBlockLen(blockGroup, targetIndices[i]);
long remaining = blockLen - positionInBlock;
if (remaining < 0) {
targetBuffers[i].limit(0);
} else if (remaining < toRecoverLen) {
targetBuffers[i].limit((int)remaining);
}
{code}
This will set the buffer limit to 0, and cause {{transferData2Targets}} to
return 0.
was (Author: jingzhao):
{{ErasureCodingWorker#ReconstructAndTransferBlock}} uses the length of the
first internal block to decide whether to continue the recovery work:
{code}
long firstStripedBlockLength = getBlockLen(blockGroup, 0);
while (positionInBlock < firstStripedBlockLength) {
{code}
However, if we are recovering a block whose length is less than the first one,
we will run into an unnecessary iteration which generates decoded result filled
with 0. Then at the end of {{recoverTargets}}, we set the limit of the decoding
output buffer based on the length of the block-to-be-recovered:
{code}
long blockLen = getBlockLen(blockGroup, targetIndices[i]);
long remaining = blockLen - positionInBlock;
if (remaining < 0) {
targetBuffers[i].limit(0);
} else if (remaining < toRecoverLen) {
targetBuffers[i].limit((int)remaining);
}
{code}
This will set the buffer limit to 0, and cause {{transferData2Targets}} to
return 0.
> ErasureCodingWorker may fail when recovering data blocks with length less
> than the first internal block
> -------------------------------------------------------------------------------------------------------
>
> Key: HDFS-9646
> URL: https://issues.apache.org/jira/browse/HDFS-9646
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding
> Affects Versions: 3.0.0
> Reporter: Takuya Fukudome
> Assignee: Jing Zhao
> Priority: Critical
> Attachments: test-reconstruct-stripe-file.patch
>
>
> This is reported by [~tfukudom]: ErasureCodingWorker may fail with the
> following exception when recovering a non-full internal block.
> {code}
> 2016-01-06 11:14:44,740 WARN datanode.DataNode
> (ErasureCodingWorker.java:run(467)) - Failed to recover striped block:
> BP-987302662-172.29.4.13-1450757377698:blk_-92233720368
> 54322288_29751
> java.io.IOException: Transfer failed for all targets.
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$ReconstructAndTransferBlock.run(ErasureCodingWorker.java:455)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)