[
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271900#comment-14271900
]
Zhe Zhang commented on HDFS-7344:
---------------------------------
Some quick comments:
Cosmetics:
# It seems the {{ec}} package should at least be under hdfs/?
# All test classes should be under src/test instead of src/main
Code logic:
# Looks like we need an updated design doc?
# In general I think the client implementation (HDFS-7545) should go before DN:
** Client support is needed in regular I/O, while DN is only involved in
recovery and conversion
** I see that the DN patch here tries to reuse the client side striping/codec
logic (e.g., {{ECRemoteBlockWriter}}). It is helpful to first finalize the
client code itself.
# Apparently {{ECRemoteBlockWriter}} is a copy of {{DFSOutputStream}} now. Many
complex components and logics in {{DFSOutputStream}} (e.g., {{DataStreamer}})
are only useful on the client side. For example, it needs a {{dataQueue}} to
buffer packets because client might write data slowly and in small units. The
client write pipeline is actually very complicated and should be avoided if
possible. How much benefits are there for a DN to transfer recovered/converted
data to peer DNs in small units, rather than after the entire block is
recovered/converted?
# How does DN initiate ECWorker?
# ECRemoteBlockReader extends ECBlockReaderBase, which implements
ECBlockReader: is this abstraction necessary? I.e., except for
ECRemoteBlockReader, what other block readers could extend ECBlockReaderBase?
# In general, rather than referring to and leveraging client side reader/writer
code, I think we should refer to DN side transfer functions like
{{DataNode#transferBlock}}, which are much simpler.
> Erasure Coding worker and support in DataNode
> ---------------------------------------------
>
> Key: HDFS-7344
> URL: https://issues.apache.org/jira/browse/HDFS-7344
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Reporter: Kai Zheng
> Assignee: Li Bo
> Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip,
> hdfs-ec-datanode.0108.zip
>
>
> According to HDFS-7285 and the design, this handles DataNode side extension
> and related support for Erasure Coding, and implements ECWorker. It mainly
> covers the following aspects, and separate tasks may be opened to handle each
> of them.
> * Process encoding work, calculating parity blocks as specified in block
> groups and codec schema;
> * Process decoding work, recovering data blocks according to block groups and
> codec schema;
> * Handle client requests for passive recovery blocks data and serving data on
> demand while reconstructing;
> * Write parity blocks according to storage policy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)