[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933994#comment-15933994
 ] 

Kai Zheng commented on HDFS-7337:
---------------------------------

[~andrew.wang], [~zhz], [~rakeshr] or anybody

Trying not to be complicated, based on the existing codes we already have, the 
goal here seems to be easier to target now.

In {{ErasureCodingPolicyManager}} we have these built-in EC policies:
{code}
  private static final int DEFAULT_CELLSIZE = 64 * 1024;
  private static final ErasureCodingPolicy SYS_POLICY1 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_6_3_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_6_3_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY2 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_3_2_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_3_2_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY3 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_6_3_LEGACY_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_6_3_LEGACY_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY4 =
      new ErasureCodingPolicy(ErasureCodeConstants.XOR_2_1_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.XOR_2_1_POLICY_ID);
  private static final ErasureCodingPolicy SYS_POLICY5 =
      new ErasureCodingPolicy(ErasureCodeConstants.RS_10_4_SCHEMA,
          DEFAULT_CELLSIZE, HdfsConstants.RS_10_4_POLICY_ID);
{code}

In {{ErasureCodeConstants}} we have these schemas used by the above policies:
{code}
  public static final String RS_CODEC_NAME = "rs";
  public static final String RS_LEGACY_CODEC_NAME = "rs-legacy";
  public static final String XOR_CODEC_NAME = "xor";
  public static final String HHXOR_CODEC_NAME = "hhxor";

  public static final ECSchema RS_6_3_SCHEMA = new ECSchema(
      RS_CODEC_NAME, 6, 3);

  public static final ECSchema RS_3_2_SCHEMA = new ECSchema(
      RS_CODEC_NAME, 3, 2);

  public static final ECSchema RS_6_3_LEGACY_SCHEMA = new ECSchema(
      RS_LEGACY_CODEC_NAME, 6, 3);

  public static final ECSchema XOR_2_1_SCHEMA = new ECSchema(
      XOR_CODEC_NAME, 2, 1);

  public static final ECSchema RS_10_4_SCHEMA = new ECSchema(
      RS_CODEC_NAME, 10, 4);
{code}

In HDFS-11314 it allows to enforce set of enabled EC policies on the NameNode 
like follow:
{code}
 <property>
  <name>dfs.namenode.ec.policies.enabled</name>
  <value>RS-6-3-64k, RS-10-4-128k</value>
  <description>Comma-delimited list of enabled erasure coding policies.
    The NameNode will enforce this when setting an erasure coding policy
    on a directory.
  </description>
</property>
{code}

For a codec the used raw coder impl can be configured as follows, using the 
{{rs}} codec as an example:
{code}
<property>
  <name>io.erasurecode.codec.rs.rawcoder</name>
  
<value>org.apache.hadoop.io.erasurecode.rawcoder.RSRawErasureCoderFactory</value>
  <description>
    Raw coder implementation for the rs codec. The default value is a
    pure Java implementation. There is also a native implementation. Its value
    is org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory.
  </description>
</property>
{code}

So given above, what would be lacked and needed now could be, a mechanism (say 
writing an XML file) to let admin users define their EC schema and policies in 
NameNode side. The reasons to do this: 
* Users want to try different codec;
* Users want to use different codec parameters, for RS codec, say 10 + 4 other 
than 6 + 3;
* Users want to try different cell size other than 64k.

Yes it's nice to have. I heard there are somebody wanting to try different 
things other than the built-in ones available in the codes. If it sounds not so 
high weight, we can work on and make it in the release cycle.

Comments?


> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: erasure-coding
>            Reporter: Zhe Zhang
>            Assignee: Kai Zheng
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, 
> PluggableErasureCodec-v3.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to