[
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978233#comment-15978233
]
Tim Yao commented on HADOOP-13200:
----------------------------------
Hi, [~andrew.wang] and [~jojochuang]
Thanks a lot for your comments and suggestions, which are really helpful. I
will give my own opinions on them.
{quote}
One high-level comment is that I'd like to be precise with our use of
terminology like codec, codec name, coder, raw coder, and raw coder factory.
{quote}
It is good to be precise and I agree with most of your points. I’ll try the
best to go thru the codes and make them consistent. On the other hand, I
thought some bit of abstraction would be good to make the codes flexible and
easier to evolve in future.
{quote}
Another high-level comment: should we restrict the names of third-party raw
coders?
{quote}
This is a good suggestion for the purpose of the standardization of
user-defined implementations. I just examined other pluggable components in
Hadoop like CompressionCodec, and looks like they don’t use such restriction.
But I do think we can do it in future development.
{quote}
Technically, isn't this a RawCoderFactoryRegistry, rather than a CodecRegistry?
{quote}
In CodecRegistry, all coders are registered and categorized by their codec. In
high level, it manages all information of different codecs, so maybe
CodecRegistry sounds more suitable and it will be easier for future extension
of this component using this high-level name.
{quote}
I am not familiar with Java 8 Lambda functions, but if I understand it
correctly, CodecRegistry#getCoderNames returns a new String list every time it
is called. Because this method is called whenever a DFSStripedOutputStream
object is instantiated, it could cause extra heap usage over the long run.
{quote}
This is a good point. I don’t worry about this much and guess JVM can manage
the temp objects well, so I’d prefer to leave this simpler. If to avoid this
and use another map managing coder names, it will incur some kinds of
complexity overhead.
For other comments that I did not mention, I think they are so reasonable that
I fully agree with and I will modify accordingly, providing an update patch
soon.
> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
> Key: HADOOP-13200
> URL: https://issues.apache.org/jira/browse/HADOOP-13200
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Tim Yao
> Priority: Blocker
> Labels: hdfs-ec-3.0-must-do
> Attachments: HADOOP-13200.02.patch, HADOOP-13200.03.patch,
> HADOOP-13200.04.patch, HADOOP-13200.05.patch
>
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may
> be some better approach allowing to customize and configure erasure coders
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy
> the relevant comments here to continue the discussion.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]