[
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289361#comment-15289361
]
Colin Patrick McCabe commented on HADOOP-13010:
-----------------------------------------------
bq. Not sure if it's good to have something like isXOR or isRS, because we'll
have more coder algorithms other than the current both.
That's a fair point. It seems unlikely that we need an isXOR or isRS method.
bq. OK, we can have \[createRawEncoder\]. Maybe it wouldn't be bad to have two
shortcut methods additionally, createRSRawEncoder and createXORRawEncoder,
because the both are the primitive, essential and most used ones in
implementing advanced coders and HDFS side. I want the both to be outstanding
and easily used.
It seems better just to have one function, {{createRawEncoder}}, than to have
lots of functions for every type of encoder.
bq. I discussed about this with Uma Maheswara Rao G quite some ago when
introducing these factories. There isn't a clear way to compose or reduce the
full class name of a raw coder because it should be plugin-able and
configurable. In current approach, for each codec, there could be some coder
implementations, and for each, the corresponding coder factory can be
configured.
We discussed this offline and I think the conclusion is that we don't need the
factories for anything.
We can just have a configuration key like {{erasure.coder.algorithm}} and then
code like this:
{code}
RawErasureEncoder createRawEncoder(Configuration conf) {
String classPrefix = conf.get("erasure.coder.algorithm",
DEFAULT_ERASURE_CODER_ALGORITHM);
String name = classPrefix + "Encoder";
Constructor<RawErasureEncoder> ctor =
classLoader.loadClass(name).getConstructor(Configuration.class);
return ctor.newInstance(conf);
}
RawErasureDecoder createRawDecoder(Configuration conf) {
String classPrefix = conf.get("erasure.coder.algorithm",
DEFAULT_ERASURE_CODER_ALGORITHM);
String name = classPrefix + "Decoder";
Constructor<RawErasureEncoder> ctor =
classLoader.loadClass(name).getConstructor(Configuration.class);
return ctor.newInstance(conf);
}
{code}
bq. It seems this can simplify the related functions, but am not sure it would
make the codes more readable. The mentioned variables are very specific to
encode/decode related calls using on-heap bytebuffer or byte array buffers.
Maybe DecodingState could be kept simple not putting too many intermediate
variables because the codes using of them are not suitable to be moved to the
class.
Reducing the number of function parameters from 8 or 9 to 1 or 2 seems like it
would make the code much more readable. I don't understand what the rationale
is for keeping these parameters out of DecodingState. Perhaps we could discuss
this in a follow-on JIRA, though.
> Refactor raw erasure coders
> ---------------------------
>
> Key: HADOOP-13010
> URL: https://issues.apache.org/jira/browse/HADOOP-13010
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Kai Zheng
> Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch,
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to
> rely class inheritance to reuse the codes, instead they can be moved to some
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a
> state holder to keep some checking results for later reuse during an
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet
> for the moment and also incurs big impact. I do wish the end result by this
> refactoring will make all the levels more clear and easier to follow.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]