[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Colin Patrick McCabe (JIRA) Wed, 18 May 2016 10:16:38 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289361#comment-15289361
 ]


Colin Patrick McCabe commented on HADOOP-13010:
-----------------------------------------------

bq. Not sure if it's good to have something like isXOR or isRS, because we'll 
have more coder algorithms other than the current both.

That's a fair point.  It seems unlikely that we need an isXOR or isRS method.

bq. OK, we can have \[createRawEncoder\]. Maybe it wouldn't be bad to have two 
shortcut methods additionally, createRSRawEncoder and createXORRawEncoder, 
because the both are the primitive, essential and most used ones in 
implementing advanced coders and HDFS side. I want the both to be outstanding 
and easily used.

It seems better just to have one function, {{createRawEncoder}}, than to have 
lots of functions for every type of encoder.

bq. I discussed about this with Uma Maheswara Rao G quite some ago when 
introducing these factories. There isn't a clear way to compose or reduce the 
full class name of a raw coder because it should be plugin-able and 
configurable. In current approach, for each codec, there could be some coder 
implementations, and for each, the corresponding coder factory can be 
configured.

We discussed this offline and I think the conclusion is that we don't need the 
factories for anything.

We can just have a configuration key like {{erasure.coder.algorithm}} and then 
code like this:

{code}
RawErasureEncoder createRawEncoder(Configuration conf) {
  String classPrefix = conf.get("erasure.coder.algorithm", 
DEFAULT_ERASURE_CODER_ALGORITHM);
  String name = classPrefix + "Encoder";
  Constructor<RawErasureEncoder> ctor = 
    classLoader.loadClass(name).getConstructor(Configuration.class);
  return ctor.newInstance(conf);
}

RawErasureDecoder createRawDecoder(Configuration conf) {
  String classPrefix = conf.get("erasure.coder.algorithm", 
DEFAULT_ERASURE_CODER_ALGORITHM);
  String name = classPrefix + "Decoder";
  Constructor<RawErasureEncoder> ctor = 
    classLoader.loadClass(name).getConstructor(Configuration.class);
  return ctor.newInstance(conf);
}
{code}

bq. It seems this can simplify the related functions, but am not sure it would 
make the codes more readable. The mentioned variables are very specific to 
encode/decode related calls using on-heap bytebuffer or byte array buffers. 
Maybe DecodingState could be kept simple not putting too many intermediate 
variables because the codes using of them are not suitable to be moved to the 
class.

Reducing the number of function parameters from 8 or 9 to 1 or 2 seems like it 
would make the code much more readable.  I don't understand what the rationale 
is for keeping these parameters out of DecodingState.  Perhaps we could discuss 
this in a follow-on JIRA, though.

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Reply via email to