[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933994#comment-15933994 ]
Kai Zheng commented on HDFS-7337: --------------------------------- [~andrew.wang], [~zhz], [~rakeshr] or anybody Trying not to be complicated, based on the existing codes we already have, the goal here seems to be easier to target now. In {{ErasureCodingPolicyManager}} we have these built-in EC policies: {code} private static final int DEFAULT_CELLSIZE = 64 * 1024; private static final ErasureCodingPolicy SYS_POLICY1 = new ErasureCodingPolicy(ErasureCodeConstants.RS_6_3_SCHEMA, DEFAULT_CELLSIZE, HdfsConstants.RS_6_3_POLICY_ID); private static final ErasureCodingPolicy SYS_POLICY2 = new ErasureCodingPolicy(ErasureCodeConstants.RS_3_2_SCHEMA, DEFAULT_CELLSIZE, HdfsConstants.RS_3_2_POLICY_ID); private static final ErasureCodingPolicy SYS_POLICY3 = new ErasureCodingPolicy(ErasureCodeConstants.RS_6_3_LEGACY_SCHEMA, DEFAULT_CELLSIZE, HdfsConstants.RS_6_3_LEGACY_POLICY_ID); private static final ErasureCodingPolicy SYS_POLICY4 = new ErasureCodingPolicy(ErasureCodeConstants.XOR_2_1_SCHEMA, DEFAULT_CELLSIZE, HdfsConstants.XOR_2_1_POLICY_ID); private static final ErasureCodingPolicy SYS_POLICY5 = new ErasureCodingPolicy(ErasureCodeConstants.RS_10_4_SCHEMA, DEFAULT_CELLSIZE, HdfsConstants.RS_10_4_POLICY_ID); {code} In {{ErasureCodeConstants}} we have these schemas used by the above policies: {code} public static final String RS_CODEC_NAME = "rs"; public static final String RS_LEGACY_CODEC_NAME = "rs-legacy"; public static final String XOR_CODEC_NAME = "xor"; public static final String HHXOR_CODEC_NAME = "hhxor"; public static final ECSchema RS_6_3_SCHEMA = new ECSchema( RS_CODEC_NAME, 6, 3); public static final ECSchema RS_3_2_SCHEMA = new ECSchema( RS_CODEC_NAME, 3, 2); public static final ECSchema RS_6_3_LEGACY_SCHEMA = new ECSchema( RS_LEGACY_CODEC_NAME, 6, 3); public static final ECSchema XOR_2_1_SCHEMA = new ECSchema( XOR_CODEC_NAME, 2, 1); public static final ECSchema RS_10_4_SCHEMA = new ECSchema( RS_CODEC_NAME, 10, 4); {code} In HDFS-11314 it allows to enforce set of enabled EC policies on the NameNode like follow: {code} <property> <name>dfs.namenode.ec.policies.enabled</name> <value>RS-6-3-64k, RS-10-4-128k</value> <description>Comma-delimited list of enabled erasure coding policies. The NameNode will enforce this when setting an erasure coding policy on a directory. </description> </property> {code} For a codec the used raw coder impl can be configured as follows, using the {{rs}} codec as an example: {code} <property> <name>io.erasurecode.codec.rs.rawcoder</name> <value>org.apache.hadoop.io.erasurecode.rawcoder.RSRawErasureCoderFactory</value> <description> Raw coder implementation for the rs codec. The default value is a pure Java implementation. There is also a native implementation. Its value is org.apache.hadoop.io.erasurecode.rawcoder.NativeRSRawErasureCoderFactory. </description> </property> {code} So given above, what would be lacked and needed now could be, a mechanism (say writing an XML file) to let admin users define their EC schema and policies in NameNode side. The reasons to do this: * Users want to try different codec; * Users want to use different codec parameters, for RS codec, say 10 + 4 other than 6 + 3; * Users want to try different cell size other than 64k. Yes it's nice to have. I heard there are somebody wanting to try different things other than the built-in ones available in the codes. If it sounds not so high weight, we can work on and make it in the release cycle. Comments? > Configurable and pluggable Erasure Codec and schema > --------------------------------------------------- > > Key: HDFS-7337 > URL: https://issues.apache.org/jira/browse/HDFS-7337 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding > Reporter: Zhe Zhang > Assignee: Kai Zheng > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-7337-prototype-v1.patch, > HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, > PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, > PluggableErasureCodec-v3.pdf > > > According to HDFS-7285 and the design, this considers to support multiple > Erasure Codecs via pluggable approach. It allows to define and configure > multiple codec schemas with different coding algorithms and parameters. The > resultant codec schemas can be utilized and specified via command tool for > different file folders. While design and implement such pluggable framework, > it’s also to implement a concrete codec by default (Reed Solomon) to prove > the framework is useful and workable. Separate JIRA could be opened for the > RS codec implementation. > Note HDFS-7353 will focus on the very low level codec API and implementation > to make concrete vendor libraries transparent to the upper layer. This JIRA > focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org