[
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275583#comment-15275583
]
Kai Zheng commented on HADOOP-13010:
------------------------------------
bq. Sorry for the delay in reviewing. We've been busy.
Actually I'm sorry here for the pushing. It's understood because we're open
source and sometimes out of sync. Even I wish to get all these refactoring and
the following ISA-L coder be committed prior to the first 3.0 cut, I myself got
pretty busy with other things recently so this comment is so late. :)
bq. Why do we have to have so many functions?
Well, if we don't have to feel afraid this change becomes too big and bump into
HDFS said, I thought we can get rid of some ones.
bq. Surely the codec, numParityUnits, numDataUnits, whether it is XOR or not,
etc. etc. should just be included in ErasureCoderOptions.
I agree it's a good idea to merge schema related info like numParityUnits and
numDataUnits into {{ErasureCoderOptions}} from a fresh point of view looking at
all of this. Not sure if it's good to have something like {{isXOR}} or
{{isRS}}, because we'll have more coder algorithms other than the current both.
bq. Then we could just have one function ... {{createRawEncoder}}
OK, we can have it. Maybe it wouldn't be bad to have two shortcut methods
additionally, createRSRawEncoder and createXORRawEncoder, because the both are
the primitive, essential and most used ones in implementing advanced coders and
HDFS side. I want the both to be outstanding and easily used.
bq. why does each particular type of encoder need its own factory? It seems
like we just need a static function for each encoder type that takes a
Configuration and ErasureCoderOptions, and we're good to go. We can locate
these static functions via reflection.
I discussed about this with [~umamaheswararao] quite some ago when introducing
these factories. There isn't a clear way to compose or reduce the full class
name of a raw coder because it should be plugin-able and configurable. In
current approach, for each codec, there could be some coder implementations,
and for each, the corresponding coder factory can be configured.
bq. Can we just include the inputs, inputOffsets, erasedIndexes, outputs,
outputOffsets in DecodingState?
It seems this can simplify the related functions, but am not sure it would make
the codes more readable. The mentioned variables are very specific to
encode/decode related calls using on-heap bytebuffer or byte array buffers.
Maybe DecodingState could be kept simple not putting too many intermediate
variables because the codes using of them are not suitable to be moved to the
class.
> Refactor raw erasure coders
> ---------------------------
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch,
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to
> rely class inheritance to reuse the codes, instead they can be moved to some
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a
> state holder to keep some checking results for later reuse during an
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet
> for the moment and also incurs big impact. I do wish the end result by this
> refactoring will make all the levels more clear and easier to follow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]