[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Kai Zheng (JIRA) Wed, 18 May 2016 11:36:22 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289544#comment-15289544
 ]


Kai Zheng commented on HADOOP-13010:
------------------------------------

Hi Colin,

Thanks for the comments. About the factories, I have to clarify the real 
problem in details and hope this works since the f2f discussion isn't going 
into details due to time constraint.

We may have the following codecs in the 1st level:
rs-legacy, rs-default (both belonging to RS)
xor,
hh or hitchhiker,
lrc,
...

And for each codec, it may use one or more raw coders, but each of such coders 
may use different implementations. For example, for the rs-default codec, we 
have two coder implementations (the pure java one and the isa-l one). Users may 
add their own coder implementation for a codec, maybe for better performance.

So that's why I would have a configuration key like this:
o.a.h.io.erasurecode.codec.(codec-name).rawcoder: (whatever value to be used to 
create or load the coder).

Currently we configured the factory to create the encoder and decoder for a 
coder implementation, I agree there could be better option here, and while 
discussing about this in details with Andrew yesterday in the SF office, wonder 
if we could achieve the effect avoding the factories using java service loader.

First, we can add codec-name and coder-name to the raw coder, so each coder 
will have a codec-name and coder-name when it's created.

Then we have the built-in coders of fixed codec-name and coder-name. Customized 
coders will be loaded via service loader.

Eventually we will have all the raw erasure coders loaded and created, then we 
can setup a mapping between codec-name and coder-name, coder-name and the 
coder-class or instance.

Does this sound good to you? If it works, then we might do this in a follow-on 
task?

Thanks again!


> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, 
> HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so 
> far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to 
> rely class inheritance to reuse the codes, instead they can be moved to some 
> utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a 
> state holder to keep some checking results for later reuse during an 
> encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet 
> for the moment and also incurs big impact. I do wish the end result by this 
> refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

Reply via email to