[ 
https://issues.apache.org/jira/browse/HDFS-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539108#comment-14539108
 ] 

Kai Zheng commented on HDFS-8347:
---------------------------------

Having a detailed discussion with [~zhz], [~hitliuyi] and etc., looks like we 
have a good approach now. 
Considering allowing coder caller to pass and encode/decode variable width of 
data is most flexible and ideal for HDFS client and DataNode, and the typical 
RS coder supports the way by its nature, we would get rid of the constraint of 
using a predefined {{chunkSize}} setting shared between encoder and decoder. As 
HitchHicker coder has to maintain and use the same width of data in decoding 
and encoding, we could have a wrapper for it to handle the input data 
aggregation and splitting to accept arbitrary width of inputs and align with 
the underlying encoding/decoding input width. Originally input data aggregation 
and splitting was thought of to be done in coder caller, but looks like it's 
hard to do so as it's already so complex to handle multiple stripping DataNode 
streamers, so better move the tweak into the coder layer for caller's easy. 
Currently erasure coder is designed, implemented and can be tested outside of 
HDFS specifics so it's relatively easy to get it right even adding the 
complexity. 

In this way we need to perform the following changes:
*  Change raw erasure coder API and remove the {{chunkSize}} parameter from 
{{initialize}} method;
* Change {{ECSchema}} and get rid of the {{chunkSize}} property;
* Allow to configure {{cellSize}} value as {{ECZone}} attribute in XAttr;
* As HH coder still needs to maintain a chunkSize value, it's suggested the 
value can be hard-coded.

[~umamaheswararao], [~szetszwo], [~jingzhao] and [~vinayrpet], would you 
comment on this if anything not clear or concerned? Thanks.

[~jack_liuquan], do you think this design change works for you or not to 
implement HitchHicker algorithm in HADOOP-11828? Thanks.

> Erasure Coding: whether to use chunkSize as the to decode buffersize for 
> Datanode striped block reconstruction.
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8347
>                 URL: https://issues.apache.org/jira/browse/HDFS-8347
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>
> Currently decode buffersize for Datanode striped block reconstruction is 
> configurable and can be less or larger than chunksize, it may cause issue for 
> Hitchhiker which may require encode/decode using same buffersize.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to