[
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yi Liu updated HDFS-7348:
-------------------------
Attachment: HDFS-7348.002.patch
Thanks Zhe for the good comment.
I update the patch according to our discussion and address the comments. Main
changes to the patch:
*1.* The buffer size is configurable now, and default size is 256KB, same as
default cell size.
*2.* Add encode and decode logic for recovery. If all missed blocks are parity
blocks, then we need to do encode, there is an improvement, I filed
HADOOP-11908. If one of missed blocks is data block, we need to do decode,
currently I found decode only works for data blocks and we also need to prepare
full inputs as Zhe said. So the decode logic in the patch is a workaround and
only works for parityBlkNum number of data blocks missed. We can update it
after HADOOP-11847.
*3.* Enhance test cases. And they success in my local env.
Zhe, following is reply to some of your comments and I address your other
comments in the patch:
{quote}
Why do we need targetInputStreams?
{quote}
My original design is to do packet ack check, we can do it in phase 2, so I
remove it from the current patch.
{quote}
The test failed on my local machine, reporting NPE when closing file
{quote}
I found it's a bug of existing code, I filed HDFS-8313 for it. The exception
occurs accidentally.
{quote}
cluster#stopDataNode might be an easier way to kill a DN?
{quote}
{{stopDataNode}} can only shutdown the DN, and NN needs to wait for long time
to mark the datanode as dead. So as I said in the test comment, we need to
clear its update time and trigger NN to check heartbeat, then NN will mark the
datanode as dead immediately, and then can schedule striped block recovery.
{quote}
Should WRITE_PACKET_SIZE be linked to BlockSender#MIN_BUFFER_WITH_TRANSFERTO
{quote}
{{BlockSender#MIN_BUFFER_WITH_TRANSFERTO}} is for transfer of continuous block
replication, it's a little different (transfer the file directly), I don't want
to connect it with that, I think it's fine we define the value directly.
{quote}
Follow on: we should consider consolidating the init thread pool logic for
hedged read, client striped read, and DN striped read.
{quote}
yes, we can do it in follow-on.
> Erasure Coding: striped block recovery
> --------------------------------------
>
> Key: HDFS-7348
> URL: https://issues.apache.org/jira/browse/HDFS-7348
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Reporter: Kai Zheng
> Assignee: Yi Liu
> Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block
> group.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)