[ 
https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741637#comment-15741637
 ] 

SammiChen edited comment on HDFS-11072 at 12/12/16 11:08 AM:
-------------------------------------------------------------

Hi Andrew, thanks for sharing your thoughts. Talking about redo policy on an 
directory tree, even we provide user knowledge about whether the policy is 
inherited or not, user still need to go through the tree to undo the policy one 
by one. Because the sub directory can have its own policy by overriding parent 
directory's policy. Unless we have feature like "replace all child with this 
directory's policy" which is not feasible in distributed environment. For 
distcp, how about add a option to explicitly reserve inherited policy(erasure 
coding policy or storage policy). Just a thought, I'm not sure if this will 
introduce massive complexity into distcp's implementation. 

I'm glad you also like the idea to introduce a new API. So, for erasure coding 
policy, there will be 4 API. 
1. setErasureCodingPolicy      
    set ec policy on directory
2. removeErasureCodingPolicy
    remove policy(ec or replication) on directory, after removal, directory 
will back to inheriting from parent directory (word "remove" is used more often 
in DistributedFileSystem API name)
3. setDefaultReplicationPolicy
    set replication on directory. This is only useful when user wants the 
directory from stop inheriting from it's parent's ec policy. 
4. getErasureCodingPolicy
    return the policy set by setErasureCodingPolicy

But even introduce a new API to handle replication case, it's still kind of 
complicated. The complexity is introduced by the "replication" policy. From my 
limited knowledge, ec is suggested for cold data, and replication is suggested 
for hot data. Set replication on a sub directory under a parent ec directory is 
useful in cases that the cold data back to hot again, right? But I don't know 
how often is this scenario, and is it worthy to introduce the complexity to 
handle the case. 

Anyway, I'm OK with the 4 API solution. Just want to make sure we are at the 
same page before I start to refine the patch. 



was (Author: sammi):
Hi Andrew, thanks for sharing your thoughts. Talking about redo policy on an 
directory tree, even we provide user knowledge about whether the policy is 
inherited or not, user still need to go through the tree to undo the policy one 
by one. Because the sub directory can have its own policy by overriding parent 
directory's policy. Unless we have feature like "replace all child with this 
directory's policy" which is not feasible in distributed environment. For 
distcp, how about add a option to explicitly reserve inherited policy(erasure 
coding policy or storage policy). Just a thought, I'm not sure if this will 
introduce massive complexity into distcp's implementation. 

I'm glad you also like the idea to introduce a new API. So, for erasure coding 
policy, there will be 4 API. 
1. setErasureCodingPolicy           set ec policy on directory
2. removeErasureCodingPolicy        remove policy(ec or replication) on 
directory, after removal, directory will back to inheriting from parent 
directory (word "remove" is used more often in DistributedFileSystem API name)
3. setDefaultReplicationPolicy      set replication on directory. This is only 
useful when user wants the directory from stop inheriting from it's parent's ec 
policy. 
4. getErasureCodingPolicy           return the policy set by 
setErasureCodingPolicy

But even introduce a new API to handle replication case, it's still kind of 
complicated. The complexity is introduced by the "replication" policy. From my 
limited knowledge, ec is suggested for cold data, and replication is suggested 
for hot data. Set replication on a sub directory under a parent ec directory is 
useful in cases that the cold data back to hot again, right? But I don't know 
how often is this scenario, and is it worthy to introduce the complexity to 
handle the case. 

Anyway, I'm OK with the 4 API solution. Just want to make sure we are at the 
same page before I start to refine the patch. 


> Add ability to unset and change directory EC policy
> ---------------------------------------------------
>
>                 Key: HDFS-11072
>                 URL: https://issues.apache.org/jira/browse/HDFS-11072
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch, 
> HDFS-11072-v3.patch, HDFS-11072-v4.patch
>
>
> Since the directory-level EC policy simply applies to files at create time, 
> it makes sense to make it more similar to storage policies and allow changing 
> and unsetting the policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to