Andrew Wang commented on HADOOP-13200:

Hi [~timmyyao], thanks for working on this, looks great overall! Mostly just 

One high-level comment is that I'd like to be precise with our use of 
terminology like codec, codec name, coder, raw coder, and raw coder factory. I 
made some more detailed comments based on my understanding, but would 
appreciate if you could confirm and make your own pass as well.

Another high-level comment: should we restrict the names of third-party raw 
coders? Previously we had namespacing from the Java package name. Maybe we 
should do something like require classes that aren't in org.apache.hadoop to 
have a name that starts with "thirdparty_" and recommend that implementers add 
an additional namespace, e.g. "thirdparty_mycompany_". Maybe this should be 
handled as part of the pluggable coder work, with related docs.

* Rather than the comment saying "put native coder first", please say why we do 
this. IIUC it's because this order is used as the default if the user does not 
specify a fallback order.
* Nit: "code name" -> "codec name" in getCoderByName javadoc
* Rename getCodecs to getCodecNames to match getCoderNames
* Expanding the class Javadoc just a little bit would be good, explaining that 
it maps codec names to coder factories that implement that codec, and that we 
use ServiceLoader to find the factories.
* Since we're using new Java, can skip the additional type declaration for 
generics on line 46 and 53.
* Technically, isn't this a RawCoderFactoryRegistry, rather than a 
* Similar, shouldn't getCoders be named getFactories, and other references to 
Coders -> CoderFactory? We don't want to get this confused with ErasureCoder or 


This line is split unnecessarily:

    fact = CodecRegistry.getInstance().getCoderByName(
            codec, coderName);

* {{createRawCoderFactory}}, rename {{codec}} parameter to {{codecName}}.
* {{getCoderNames}}, rename function to {{getRawCoderNames}}, and rename 
{{codec}} param to {{codecName}}
* {{createRawEncoderWithFallback}} and {{createRawDecoder}}: rename param 
{{codec}} to {{codecName}}, {{coderNames}} to {{rawCoderNames}}, {{coderName}} 
to {{rawCoderName}}

* verifySchema is unused, remove?

> Seeking a better approach allowing to customize and configure erasure coders
> ----------------------------------------------------------------------------
>                 Key: HADOOP-13200
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13200
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Tim Yao
>            Priority: Blocker
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HADOOP-13200.02.patch, HADOOP-13200.03.patch, 
> HADOOP-13200.04.patch, HADOOP-13200.05.patch
> This is a follow-on task for HADOOP-13010 as discussed over there. There may 
> be some better approach allowing to customize and configure erasure coders 
> than the current having raw coder factory, as [~cmccabe] suggested. Will copy 
> the relevant comments here to continue the discussion.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to