[ 
https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273239#comment-14273239
 ] 

Li Bo commented on HDFS-7344:
-----------------------------

Thanks Zhe for your comments.
Yes, you're right, we'd better implement client side work first. In fact, the 
code uploaded is about non-striping encode/decode before we decided to 
implement striping first. Because the basic idea is similar and 
BlockReader/BlockWriter will be reused, I hope we can get some feedbacks to 
help further development.
The ec package can be refactored to a proper place. For DN side EC work, 
multiple blocks may be generated. If we send these blocks after they're 
entirely generated, each EC work will consume a lot of memory, typically 
4*128M~6*128M, and there may be several EC work for a DN at the same time. So a 
better choice is to allocate a buffer for each EC work(produce-consumer model). 
When the buffer is full, encoder/decoder will wait for BlockWriter to write the 
buffer locally or remotely. 
BlockReader and BlockWriter will have several sub classes, that is, operate 
data locally or remotely, work in datanode or client. We can refine the logic 
to get the best efficiency for different classes. Each DN has one ECWorker. 
When DN receives an encoding/decoding work from namenode, it will send it to 
ECWorker.
DN may contain some logic similar to BlockWriter/BlockReader, but it is complex 
to extends them or reuse them. For example, BlockSender sends a block to remote 
DN, but it reads the block from disk. We can replace the input stream or extend 
this class, but I think completely rewrite the logic seems better.  


> Erasure Coding worker and support in DataNode
> ---------------------------------------------
>
>                 Key: HDFS-7344
>                 URL: https://issues.apache.org/jira/browse/HDFS-7344
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Kai Zheng
>            Assignee: Li Bo
>         Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, 
> hdfs-ec-datanode.0108.zip
>
>
> According to HDFS-7285 and the design, this handles DataNode side extension 
> and related support for Erasure Coding, and implements ECWorker. It mainly 
> covers the following aspects, and separate tasks may be opened to handle each 
> of them.
> * Process encoding work, calculating parity blocks as specified in block 
> groups and codec schema;
> * Process decoding work, recovering data blocks according to block groups and 
> codec schema;
> * Handle client requests for passive recovery blocks data and serving data on 
> demand while reconstructing;
> * Write parity blocks according to storage policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to