[
https://issues.apache.org/jira/browse/HADOOP-13200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976220#comment-15976220
]
Andrew Wang commented on HADOOP-13200:
--------------------------------------
Hi [~timmyyao], thanks for working on this, looks great overall! Mostly just
nits.
One high-level comment is that I'd like to be precise with our use of
terminology like codec, codec name, coder, raw coder, and raw coder factory. I
made some more detailed comments based on my understanding, but would
appreciate if you could confirm and make your own pass as well.
Another high-level comment: should we restrict the names of third-party raw
coders? Previously we had namespacing from the Java package name. Maybe we
should do something like require classes that aren't in org.apache.hadoop to
have a name that starts with "thirdparty_" and recommend that implementers add
an additional namespace, e.g. "thirdparty_mycompany_". Maybe this should be
handled as part of the pluggable coder work, with related docs.
CodecRegistry:
* Rather than the comment saying "put native coder first", please say why we do
this. IIUC it's because this order is used as the default if the user does not
specify a fallback order.
* Nit: "code name" -> "codec name" in getCoderByName javadoc
* Rename getCodecs to getCodecNames to match getCoderNames
* Expanding the class Javadoc just a little bit would be good, explaining that
it maps codec names to coder factories that implement that codec, and that we
use ServiceLoader to find the factories.
* Since we're using new Java, can skip the additional type declaration for
generics on line 46 and 53.
* Technically, isn't this a RawCoderFactoryRegistry, rather than a
CodecRegistry?
* Similar, shouldn't getCoders be named getFactories, and other references to
Coders -> CoderFactory? We don't want to get this confused with ErasureCoder or
RawErasureCoder/Decoder.
CodecUtil:
This line is split unnecessarily:
{noformat}
fact = CodecRegistry.getInstance().getCoderByName(
codec, coderName);
{noformat}
* {{createRawCoderFactory}}, rename {{codec}} parameter to {{codecName}}.
* {{getCoderNames}}, rename function to {{getRawCoderNames}}, and rename
{{codec}} param to {{codecName}}
* {{createRawEncoderWithFallback}} and {{createRawDecoder}}: rename param
{{codec}} to {{codecName}}, {{coderNames}} to {{rawCoderNames}}, {{coderName}}
to {{rawCoderName}}
RawErasureCoderFactory:
* verifySchema is unused, remove?
> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>
> Key: HADOOP-13200
> URL: https://issues.apache.org/jira/browse/HADOOP-13200
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Tim Yao
> Priority: Blocker
> Labels: hdfs-ec-3.0-must-do
> Attachments: HADOOP-13200.02.patch, HADOOP-13200.03.patch,
> HADOOP-13200.04.patch, HADOOP-13200.05.patch
>
>
> This is a follow-on task for HADOOP-13010 as discussed over there. There may
> be some better approach allowing to customize and configure erasure coders
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy
> the relevant comments here to continue the discussion.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]