[
https://issues.apache.org/jira/browse/HDFS-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539108#comment-14539108
]
Kai Zheng commented on HDFS-8347:
---------------------------------
Having a detailed discussion with [~zhz], [~hitliuyi] and etc., looks like we
have a good approach now.
Considering allowing coder caller to pass and encode/decode variable width of
data is most flexible and ideal for HDFS client and DataNode, and the typical
RS coder supports the way by its nature, we would get rid of the constraint of
using a predefined {{chunkSize}} setting shared between encoder and decoder. As
HitchHicker coder has to maintain and use the same width of data in decoding
and encoding, we could have a wrapper for it to handle the input data
aggregation and splitting to accept arbitrary width of inputs and align with
the underlying encoding/decoding input width. Originally input data aggregation
and splitting was thought of to be done in coder caller, but looks like it's
hard to do so as it's already so complex to handle multiple stripping DataNode
streamers, so better move the tweak into the coder layer for caller's easy.
Currently erasure coder is designed, implemented and can be tested outside of
HDFS specifics so it's relatively easy to get it right even adding the
complexity.
In this way we need to perform the following changes:
* Change raw erasure coder API and remove the {{chunkSize}} parameter from
{{initialize}} method;
* Change {{ECSchema}} and get rid of the {{chunkSize}} property;
* Allow to configure {{cellSize}} value as {{ECZone}} attribute in XAttr;
* As HH coder still needs to maintain a chunkSize value, it's suggested the
value can be hard-coded.
[~umamaheswararao], [~szetszwo], [~jingzhao] and [~vinayrpet], would you
comment on this if anything not clear or concerned? Thanks.
[~jack_liuquan], do you think this design change works for you or not to
implement HitchHicker algorithm in HADOOP-11828? Thanks.
> Erasure Coding: whether to use chunkSize as the to decode buffersize for
> Datanode striped block reconstruction.
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-8347
> URL: https://issues.apache.org/jira/browse/HDFS-8347
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Kai Zheng
>
> Currently decode buffersize for Datanode striped block reconstruction is
> configurable and can be less or larger than chunksize, it may cause issue for
> Hitchhiker which may require encode/decode using same buffersize.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)