[ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14518630#comment-14518630
 ] 

Yi Liu commented on HDFS-7348:
------------------------------

Thanks [~zhz] for the review! The comments are helpful.

{quote}
A the DN level I don't think we need to care about cellSize? Since we always 
recover entire blocks, the client-side logic taking care of cells can be 
simplified here.
{quote}
Yes, DN don't need to care about cellSize, here actually we just use it as a 
read buffer size and it divide {{bytesPerChecksum}}, so it's a bit convenient 
for crc calculation. 

{quote}
Since recovering multiple missing blocks at once is a pretty rare case, should 
we just reconstruct all missing blocks and use DataNode#DataTransfer to push 
them out?
Should we save a copy of reconstructed block locally? More space will be used; 
but it will avoid re-decoding if push fails.
{quote}
Good question and discussion. The best way to avoid re-decoding if push fails 
is to check the packet ack before we discard the decoded result and start next 
round decoding. Save a copy locally will increase DataNode burden (i.e, affect 
performance, disk space/management, calculate crc multiple time and so on) and 
increase management, if we don't check the packet ack, we can't know whether 
the recovered block is transfer correctly, if we choose to check the packet 
ack, we don't need to save it locally.
As I described in the design above or comments inline code, currently we don't 
check the packet ack which is similar as continuous block replication, but EC 
recovery is more expensive, we could consider to check the packet ack in 
further improvement. I can do it (check packet ack) in separate JIRA, maybe in 
phase 2, of course we can discuss more here.

{quote}
I filed HDFS-8282 to move StripedReadResult and waitNextCompletion to 
StripedBlockUtil.
{quote}
That's good, I will review that JIRA after it's ready.

{quote}
In foreground recovery we read in parallel to minimize latency. It's an 
interesting design question whether we should we do the same in background 
recovery. More discussions are needed here.
{quote}
We can discuss more for this point here. I think it's OK and don't see bad 
side, if we don't recovery it as soon as possible in DN, the client also need 
to do on-line read recovery which may cause more network IO (multiple client).

{quote}
 Another option is to read entire blocks and then decode
{quote}
It's big issue for memory, especially there may be multiple stripe block 
recovery at the same time. I think we should not do this....   On the other 
hand, the fast way to decode is use native code and utilize CPU instruction as 
we planed in the design, I have experience when writing native decryption code 
for HDFS encryption at rest feature, we also have a buffer (default 64KB) to 
invoke JNI.


{quote}
Maybe we can move getBlock to StripedBlockUtil too; it's a useful util to only 
parse the Block. If it sounds good to you I'll move it in HDFS-8282.
{quote}
It's good for me if you move it in HDFS-8282, I think we also need to use it in 
future :)

I will fix the {{ArrayList<>}} initialization in next patch.

> Erasure Coding: striped block recovery
> --------------------------------------
>
>                 Key: HDFS-7348
>                 URL: https://issues.apache.org/jira/browse/HDFS-7348
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Kai Zheng
>            Assignee: Yi Liu
>         Attachments: ECWorker.java, HDFS-7348.001.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block 
> group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to