[
https://issues.apache.org/jira/browse/HDFS-11072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741637#comment-15741637
]
SammiChen edited comment on HDFS-11072 at 12/12/16 11:09 AM:
-------------------------------------------------------------
Hi Andrew, thanks for sharing your thoughts. Talking about redo policy on an
directory tree, even we provide user knowledge about whether the policy is
inherited or not, user still need to go through the tree to undo the policy one
by one. Because the sub directory can have its own policy by overriding parent
directory's policy. Unless we have feature like "replace all child with this
directory's policy" which is not feasible in distributed environment. For
distcp, how about add a option to explicitly reserve inherited policy(erasure
coding policy or storage policy). Just a thought, I'm not sure if this will
introduce massive complexity into distcp's implementation.
I'm glad you also like the idea to introduce a new API. So, for erasure coding
policy, there will be 4 API.
1. setErasureCodingPolicy
set ec policy on directory
2. removeErasureCodingPolicy
remove policy(ec or replication) on directory, after removal, directory
will back to inheriting from parent directory (word "remove" is used more often
in DistributedFileSystem API name than unset)
3. setDefaultReplicationPolicy
set replication on directory. This is only useful when user wants the
directory from stop inheriting from it's parent's ec policy.
4. getErasureCodingPolicy
return the policy set by setErasureCodingPolicy
But even introduce a new API to handle replication case, it's still kind of
complicated. The complexity is introduced by the "replication" policy. From my
limited knowledge, ec is suggested for cold data, and replication is suggested
for hot data. Set replication on a sub directory under a parent ec directory is
useful in cases that the cold data back to hot again, right? But I don't know
how often is this scenario, and is it worthy to introduce the complexity to
handle the case.
Anyway, I'm OK with the 4 API solution. Just want to make sure we are at the
same page before I start to refine the patch.
was (Author: sammi):
Hi Andrew, thanks for sharing your thoughts. Talking about redo policy on an
directory tree, even we provide user knowledge about whether the policy is
inherited or not, user still need to go through the tree to undo the policy one
by one. Because the sub directory can have its own policy by overriding parent
directory's policy. Unless we have feature like "replace all child with this
directory's policy" which is not feasible in distributed environment. For
distcp, how about add a option to explicitly reserve inherited policy(erasure
coding policy or storage policy). Just a thought, I'm not sure if this will
introduce massive complexity into distcp's implementation.
I'm glad you also like the idea to introduce a new API. So, for erasure coding
policy, there will be 4 API.
1. setErasureCodingPolicy
set ec policy on directory
2. removeErasureCodingPolicy
remove policy(ec or replication) on directory, after removal, directory
will back to inheriting from parent directory (word "remove" is used more often
in DistributedFileSystem API name)
3. setDefaultReplicationPolicy
set replication on directory. This is only useful when user wants the
directory from stop inheriting from it's parent's ec policy.
4. getErasureCodingPolicy
return the policy set by setErasureCodingPolicy
But even introduce a new API to handle replication case, it's still kind of
complicated. The complexity is introduced by the "replication" policy. From my
limited knowledge, ec is suggested for cold data, and replication is suggested
for hot data. Set replication on a sub directory under a parent ec directory is
useful in cases that the cold data back to hot again, right? But I don't know
how often is this scenario, and is it worthy to introduce the complexity to
handle the case.
Anyway, I'm OK with the 4 API solution. Just want to make sure we are at the
same page before I start to refine the patch.
> Add ability to unset and change directory EC policy
> ---------------------------------------------------
>
> Key: HDFS-11072
> URL: https://issues.apache.org/jira/browse/HDFS-11072
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: erasure-coding
> Affects Versions: 3.0.0-alpha1
> Reporter: Andrew Wang
> Assignee: SammiChen
> Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-11072-v1.patch, HDFS-11072-v2.patch,
> HDFS-11072-v3.patch, HDFS-11072-v4.patch
>
>
> Since the directory-level EC policy simply applies to files at create time,
> it makes sense to make it more similar to storage policies and allow changing
> and unsetting the policy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]