[
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520686#comment-14520686
]
Yi Liu commented on HDFS-7348:
------------------------------
Thanks Bo and Zhe, for the discussion.
{quote}
On the write path:
...
{quote}
Current implementation the recovery node is one of the source node, of course
we can change as it can also be on one of target in future, actually there is
no big difference (we can read locally for the first one, and write locally for
the second one), both are reasonable for me.
Assume the recovery node is one of target (final destination, "fast track" as
you said), it's an optimization of current patch, we need to write it directly
to target folder (as we receive a continuous block replication in datanode),
but we don't need {{DataNode#DataTransfer}}, if there are more than one
targets, then it's a different decoded block, we need to transfer it to another
target.
The situation becomes:
1) If one target is remote, then we send the block directly.
2) If the target is local, we write it to local directly.
So it will never happen: we need to save the decoded block locally and then
need to transfer again.
Hi Guys, send a block directly is not a big deal, don't need to force to use
{{DataNode#DataTransfer}}, of course we may can refine the common part out,
further more, as I said before, we may need to check the packet ack in future.
{quote}
On the read path:
....
{quote}
Currently the default cell size is 256KB, it's not a small value, we can also
expose it as a configuration.
About the sequential vs. parallel reading, if it's a pain decision, then I can
make it configurable too.
So guys, let's have a *conclusion*: how about we do as following:
*1.* Mainly keep the current approach in the patch. Also as I said in the
patch design, we need to do one optimization: if one source is local, we
should read it directly, [~zhz], I think you can do this further improvement in
your patch about block reader, maybe in phase 2? since it will not block the
functionality.
*2.* About write locally, certainly if the recovery node is one of target, then
we should write it directly to datanode. But the write is directly to target
location as we receive a continuous block replication in datanode, we don't
need to transfer it again. [~libo-intel], you can do this further improvement
in your patch about block writer, maybe also in phase 2, currently it will not
block the functionality?
*3.* I make the buffer size configurable, and also sequential configurable,
also in a following one JIRA?
*4.* I file a follow-on to check the packet ack and can do it later?
The remaining thing is to wait for the decode of HADOOP-11847, and I update
that part in the patch and also the test. Of course, you guys please review
the existing code too. Thanks, sounds good?
> Erasure Coding: striped block recovery
> --------------------------------------
>
> Key: HDFS-7348
> URL: https://issues.apache.org/jira/browse/HDFS-7348
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Reporter: Kai Zheng
> Assignee: Yi Liu
> Attachments: ECWorker.java, HDFS-7348.001.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block
> group.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)