[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Kai Zheng (JIRA) Sun, 08 May 2016 05:38:23 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275583#comment-15275583
 ]


Kai Zheng commented on HADOOP-13010:
------------------------------------

bq. Sorry for the delay in reviewing. We've been busy.
Actually I'm sorry here for the pushing. It's understood because we're open 
source and sometimes out of sync. Even I wish to get all these refactoring and 
the following ISA-L coder be committed prior to the first 3.0 cut, I myself got 
pretty busy with other things recently so this comment is so late. :)

bq. Why do we have to have so many functions?
Well, if we don't have to feel afraid this change becomes too big and bump into 
HDFS said, I thought we can get rid of some ones.

bq. Surely the codec, numParityUnits, numDataUnits, whether it is XOR or not, 
etc. etc. should just be included in ErasureCoderOptions.
I agree it's a good idea to merge schema related info like numParityUnits and 
numDataUnits into {{ErasureCoderOptions}} from a fresh point of view looking at 
all of this. Not sure if it's good to have something like {{isXOR}} or 
{{isRS}}, because we'll have more coder algorithms other than the current both.

bq. Then we could just have one function ... {{createRawEncoder}}
OK, we can have it. Maybe it wouldn't be bad to have two shortcut methods 
additionally, createRSRawEncoder and createXORRawEncoder, because the both are 
the primitive, essential and most used ones in implementing advanced coders and 
HDFS side. I want the both to be outstanding and easily used.

bq. why does each particular type of encoder need its own factory? It seems 
like we just need a static function for each encoder type that takes a 
Configuration and ErasureCoderOptions, and we're good to go. We can locate 
these static functions via reflection.
I discussed about this with [~umamaheswararao] quite some ago when introducing 
these factories. There isn't a clear way to compose or reduce the full class 
name of a raw coder because it should be plugin-able and configurable. In 
current approach, for each codec, there could be some coder implementations, 
and for each, the corresponding coder factory can be configured.

bq. Can we just include the inputs, inputOffsets, erasedIndexes, outputs, 
outputOffsets in DecodingState?
It seems this can simplify the related functions, but am not sure it would make 
the codes more readable. The mentioned variables are very specific to 
encode/decode related calls using on-heap bytebuffer or byte array buffers. 
Maybe DecodingState could be kept simple not putting too many intermediate 
variables because the codes using of them are not suitable to be moved to the 
class.

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Reply via email to