[ 
https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723896#comment-15723896
 ] 

Andrew Wang commented on HDFS-11072:
------------------------------------

Hi Sammi, thanks for working on this. Some review comments in addition to 
Rakesh's:

* Can we just say "replication" rather than "continuous replicate"? e.g. 
"getReplicationPolicy" instead of "getContinuousReplicatePolicy"
* Note that setting a "replication" EC policy is still different from 
unsetting. Unsetting means the policy will be inherited from an ancestor. 
Setting a "replication" policy means the "replication" policy will be used. 
Imagine a situation where there are "/a" has RS 6,3 set and "/a/b" has XOR 2,1 
set. On "/a/b", unsetting vs. setting "replication" will have different 
effects. So we also need an unset API, similar to the unset storage policy API.

Comment in ECPolicyManager, recommend reword like this:
{noformat}
  /*
   * This is a special policy. When this policy is applied to a directory, its
   * children will be replicated rather than inheriting an erasure coding policy
   * from an ancestor directory.
   *
   * This policy is only used when setting an erasure coding policy. It will 
not be
   * returned when get erasure coding policy is called.
   */
{noformat}

* FSDirErasureCodingOp: rename "ecXAttrExisted" to "hasEcXAttr"
* FSDirErasureCodingOp: should rename createErasureCodingPolicyXAttr to 
setErasureCodingPolicyXAttr, since it can now replace
* Why do we hide the replication policy for calls to 
getErasureCodingPolicyForPath for directories? Makes sense for files since they 
are just replicated, but directory-level policies act like normal EC policies 
in that they can be inherited.
* Rather than add new function getErasureCodingPolicyXAttrForLastINode to set a 
boolean, seems like we could call a "hasErasureCodingPolicy" method (the 
current one is also unused). Since this is only for paths that exist, it's safe 
to use FSDirectory.resolveLastINode instead of a for loop that skips nulls. We 
only need that for loop when creating a new path.
* To assist with the above, I feel like we should have a 
{{getErasureCodingPolicy(INode)}} method that does this block in 
getErasureCodingPolicyForPath:

{code}
        final XAttrFeature xaf = inode.getXAttrFeature();
        if (xaf != null) {
          XAttr xattr = xaf.getXAttr(XATTR_ERASURECODING_POLICY);
          if (xattr != null) {
            ByteArrayInputStream bIn = new 
ByteArrayInputStream(xattr.getValue());
            DataInputStream dIn = new DataInputStream(bIn);
            String ecPolicyName = WritableUtils.readString(dIn);
            if (!ecPolicyName.equalsIgnoreCase(ErasureCodingPolicyManager
                .getContinuousReplicatePolicy().getName())) {
              return fsd.getFSNamesystem().getErasureCodingPolicyManager().
                  getPolicyByName(ecPolicyName);
            } else {
              return null;
            }
          }
        }
{code}

Documentation:
* "Another purpose of this special policy is to unset the erasure coding policy 
of a directory back to the traditional replications.", I don't think we should 
say this, since we also support actually unsetting the EC policy. The 
replication policy is still a policy that overrides policies on ancestor 
directories.
* Do the parameters "1-2-64K" have any meaning? If not, we should explain that 
they are meaningless, or hide the parameters so we don't need to talk about 
them.

Tests:
* It's better to use more specific asserts like {{assertNull}}, 
{{assertNotNull}, etc instead of just {{assertTrue}}
* Would be good to create files with different replication factors.

> Add ability to unset and change directory EC policy
> ---------------------------------------------------
>
>                 Key: HDFS-11072
>                 URL: https://issues.apache.org/jira/browse/HDFS-11072
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch, 
> HDFS-11072-v3.patch, HDFS-11072-v4.patch
>
>
> Since the directory-level EC policy simply applies to files at create time, 
> it makes sense to make it more similar to storage policies and allow changing 
> and unsetting the policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to