[jira] [Commented] (HDFS-7081) Add new DistributedFileSystem API for getting all the existing storage policies

Andrew Wang (JIRA) Tue, 23 Sep 2014 18:20:45 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145738#comment-14145738
 ]


Andrew Wang commented on HDFS-7081:
-----------------------------------

bq. If we can set storage policy directly on a directory, why do we still need 
to do it recursively? But to provide a tool for easier administration (not just 
for setting storage policy) is always good.

This is related to my question about renames. I could see an admin wanting to 
know that everything in a subtree uses some storage policy. However, if a file 
already has a policy set and is renamed underneath this subtree, the subtree's 
policy won't apply. A recursive tool could be used to satisfy this usecase.

As one data point, I know Hive uses a temp dir during query processing and 
renames things in and out.

I'm still hoping we can avoid this rename ambiguity though, since it'd make 
management simpler. If we need per-file granularity, then I think my idea from 
above would work. Basically, do not set UNSPECIFIED on files. At create time, a 
files sets its storage policy either to an inherited parent policy, or the 
default policy. Then rename will never change a file's policy.

bq. For this one I have a question. According to the current document "TRUSTED 
namespace attributes are only visible and accessible to privileged users." 
Currently the storage policy is actually set by superuser and in HDFS we do not 
have root user. So does that mean we should use trusted here?

TRUSTED and USER are meant to be used by end user applications. The idea is 
that apps can stash whatever app data they want in those xattr namespaces and 
not worry about name collisions (except from other apps). For HDFS developers 
who want to leverage xattr storage for a feature, an internal namespace like 
system is more appropriate so as not to pollute the user namespaces. As we're 
doing in this JIRA, the additional data can be exposed to users via some new 
API, rather than through getXAttrs.

As to the rest, I'll just trust you and Nic. I'm not sure I'll have time to 
review more this week, so we can just do follow-ons. Thanks guys.

> Add new DistributedFileSystem API for getting all the existing storage 
> policies
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-7081
>                 URL: https://issues.apache.org/jira/browse/HDFS-7081
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer, namenode
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7081.000.patch, HDFS-7081.001.patch, 
> HDFS-7081.002.patch, HDFS-7081.003.patch
>
>
> Instead of loading all the policies from a client side configuration file, it 
> may be better to provide Mover with a new RPC call for getting all the 
> storage policies from the namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7081) Add new DistributedFileSystem API for getting all the existing storage policies

Reply via email to