[jira] [Commented] (HADOOP-13200) Seeking a better approach allowing to customize and configure erasure coders

Tim Yao (JIRA) Fri, 21 Apr 2017 00:35:21 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978233#comment-15978233
 ]


Tim Yao commented on HADOOP-13200:
----------------------------------

Hi, [~andrew.wang] and [~jojochuang]
Thanks a lot for your comments and suggestions, which are really helpful. I 
will give my own opinions on them.
{quote}
One high-level comment is that I'd like to be precise with our use of 
terminology like codec, codec name, coder, raw coder, and raw coder factory.
{quote}
It is good to be precise and I agree with most of your points. I’ll try the 
best to go thru the codes and make them consistent. On the other hand, I 
thought some bit of abstraction would be good to make the codes flexible and 
easier to evolve in future.
{quote}
Another high-level comment: should we restrict the names of third-party raw 
coders?
{quote} 
This is a good suggestion for the purpose of the standardization of 
user-defined implementations. I just examined other pluggable components in 
Hadoop like CompressionCodec, and looks like they don’t use such restriction. 
But I do think we can do it in future development.
{quote}
Technically, isn't this a RawCoderFactoryRegistry, rather than a CodecRegistry?
{quote}
In CodecRegistry, all coders are registered and categorized by their codec. In 
high level, it manages all information of different codecs, so maybe 
CodecRegistry sounds more suitable and it will be easier for future extension 
of this component using this high-level name.
{quote}
I am not familiar with Java 8 Lambda functions, but if I understand it 
correctly, CodecRegistry#getCoderNames returns a new String list every time it 
is called. Because this method is called whenever a DFSStripedOutputStream 
object is instantiated, it could cause extra heap usage over the long run.
{quote}
This is a good point. I don’t worry about this much and guess JVM can manage 
the temp objects well, so I’d prefer to leave this simpler.  If to avoid this 
and use another map managing coder names, it will incur some kinds of 
complexity overhead.

For other comments that I did not mention, I think they are so reasonable that 
I fully agree with and I will modify accordingly, providing an update patch 
soon.

> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-13200
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13200
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Tim Yao
>            Priority: Blocker
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HADOOP-13200.02.patch, HADOOP-13200.03.patch, 
> HADOOP-13200.04.patch, HADOOP-13200.05.patch
>
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may 
> be some better approach allowing to customize and configure erasure coders 
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy 
> the relevant comments here to continue the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13200) Seeking a better approach allowing to customize and configure erasure coders

Reply via email to