[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

Zhe Zhang (JIRA) Fri, 09 Jan 2015 13:31:54 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271900#comment-14271900
 ]


Zhe Zhang commented on HDFS-7344:
---------------------------------

Some quick comments:
Cosmetics:
# It seems the {{ec}} package should at least be under hdfs/?
# All test classes should be under src/test instead of src/main

Code logic:
# Looks like we need an updated design doc?
# In general I think the client implementation (HDFS-7545) should go before DN:
** Client support is needed in regular I/O, while DN is only involved in 
recovery and conversion
** I see that the DN patch here tries to reuse the client side striping/codec 
logic (e.g., {{ECRemoteBlockWriter}}). It is helpful to first finalize the 
client code itself.
# Apparently {{ECRemoteBlockWriter}} is a copy of {{DFSOutputStream}} now. Many 
complex components and logics in {{DFSOutputStream}} (e.g., {{DataStreamer}}) 
are only useful on the client side. For example, it needs a {{dataQueue}} to 
buffer packets because client might write data slowly and in small units. The 
client write pipeline is actually very complicated and should be avoided if 
possible. How much benefits are there for a DN to transfer recovered/converted 
data to peer DNs in small units, rather than after the entire block is 
recovered/converted?
# How does DN initiate ECWorker? 
# ECRemoteBlockReader extends ECBlockReaderBase, which implements 
ECBlockReader: is this abstraction necessary? I.e., except for 
ECRemoteBlockReader, what other block readers could extend ECBlockReaderBase?
# In general, rather than referring to and leveraging client side reader/writer 
code, I think we should refer to DN side transfer functions like 
{{DataNode#transferBlock}}, which are much simpler.

> Erasure Coding worker and support in DataNode
> ---------------------------------------------
>
>                 Key: HDFS-7344
>                 URL: https://issues.apache.org/jira/browse/HDFS-7344
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Kai Zheng
>            Assignee: Li Bo
>         Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
> hdfs-ec-datanode.0108.zip
>
>
> According to HDFS-7285 and the design, this handles DataNode side extension 
> and related support for Erasure Coding, and implements ECWorker. It mainly 
> covers the following aspects, and separate tasks may be opened to handle each 
> of them.
> * Process encoding work, calculating parity blocks as specified in block 
> groups and codec schema;
> * Process decoding work, recovering data blocks according to block groups and 
> codec schema;
> * Handle client requests for passive recovery blocks data and serving data on 
> demand while reconstructing;
> * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7344) Erasure Coding worker and support in DataNode

Reply via email to