[ 
https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7081:
----------------------------
    Attachment: HDFS-7081.003.patch

Thanks for the comments, Andrew. Upload a new patch to address your comments. 
Since some of your comments on BlockStoragePolicySuite/BlockStoragePolicy are 
easy to address, I just include them in this patch. Maybe we can create a 
separate jira to address the remaining comments.

bq. Any reason we don't also expose java APIs for setting/getting the policy in 
HdfsAdmin? Need to have public classes then too.

Initially we planned to allow normal users to set storage policy in the next
step of heterogeneous storage. Currently I just follow your comments
and add the setting policy API into HdfsAdmin. We can remove it in the future 
if necessary. (For getting the policy it does not require superuser thus we do 
not need to add it in HdfsAdmin).

bq. Does DFSAdmin have a way of getting the list of storage policies? Not sure 
how you'd know what policy to set if you don't know what's available.

Getting the list of storage policies does not require superuser permission thus
actually we should add a separate tool for this (similar with
listSnapshottableDir which also allows a normal user to list snapshottble dirs
belong to him/her). I plan to add this in a separate jira since the current 
patch is already big.

bq. Also a bit surprised that the API is string based rather than ID based. 
Making it ID based would be more efficient and also allow renaming policies if 
desired.

I guess you're referring to {{setStoragePolicy}} here? Currently internally
(inside of NameNode) we attach ID to files/directories (thus a policy can be
easily renamed), and we use String for end users to specify the policy, where
I think to use the policy name is much more user-friendly. And since it is not
common to set storage policy for a large amount of files/directories and the 
name is usually short, the efficiency may not be an issue here.

bq. FSNamesystem: Need javadoc on new getter

Done. Thanks for pointing this out.

bq. The handling of directory storage policies seems inconsistent with rename. 
If you rename an UNSPECIFIED file, it will pick up a potentially different 
storage ID from its destination. However, if it already has one set (e.g. it 
was created under a dir with a set policy), it'll keep that policy. Have you 
considered always setting the file's storage policy to the default? Then rename 
would be consistent. The other alternative is doing away with per-file storage 
policies and only specifying them on a directory. This also seems reasonable to 
me from a management perspective.
bq. This seems worth documenting at the least.

I think to have policy on both files/directories is necessary since in the
future we can have tools/services keeping scanning the namespace and marking 
files/directories based on their access/modification time etc. Currently we do 
not keep the UNSPECIFIED policy mainly because we want to allow users to set a 
cold/warm directory and then to change data's temperature by moving 
files/directories into it. We will add more details into the document.

bq. We should also consider having tooling that can do a recursive 
setStoragePolicy for easier administration.

If we can set storage policy directly on a directory, why do we still need to do
it recursively? But to provide a tool for easier administration (not just for
setting storage policy) is always good.

bq. BlockStoragePolicySuite: In the class javadoc, "Suite" by itself doesn't 
mean much to me. Could we mention "collection of storage policies"?

Done.

bq. The xattr name "bsp" is not very descriptive. Maybe 
"hsm.block.storage.policy.id" instead? We dedupe the names in memory, so the 
size of the name only really matters when it's serialized to the fsimage. I 
think we could dedupe them there too with a little work.

Done. This is a very good suggestion.

bq. We also shouldn't be using the trusted namespace, since it pollutes a 
namespace that's meant for use only by the root user. I'd recommend system 
instead.

For this one I have a question. According to the current document "TRUSTED
namespace attributes are only visible and accessible to privileged users."
Currently the storage policy is actually set by superuser and in HDFS we do not 
have root user. So does that mean we should use trusted here?

bq. BlockStoragePolicy#readBlockStorageSuite(conf) doesn't seem to be used, 
could remove some other helper functions then too
bq. Would prefer if we didn't initialize DEFAULT_SUITE until it's needed, avoid 
some static init cost. Can be straight up removed if you remove 
readBlockStorageSuite(conf).

Actually I kept this part of code because I planed to reuse/move this part of 
code in DFSAdmin for admin to set new policies based on configuration files. 
Since this is out of the scope of this jira currently I will just remove all 
the code.

bq. Has some lines longer than 80 chars

Done.

bq. BlockStoragePolicy: Should be using slf4j for new logging

Done for BlockStoragePolicy/BlockStoragePolicySuite.

bq. I'd strongly consider a builder pattern for this constructor, since it 
seems likely/possible that we'll want to add more fields here.

Currently the BlockStoragePolicy has only one constructor. All its 5 fields are
final and must be specified without default values. So right now I do not see
the necessity of a builder here, but we can add it in the future if necessary.

bq. diff: I'm not sure what a "list difference" is, but it's not a set 
difference since it doesn't handle duplicates. Update javadoc about the 
expectations?

Done.

> Add new DistributedFileSystem API for getting all the existing storage 
> policies
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-7081
>                 URL: https://issues.apache.org/jira/browse/HDFS-7081
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer, namenode
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7081.000.patch, HDFS-7081.001.patch, 
> HDFS-7081.002.patch, HDFS-7081.003.patch
>
>
> Instead of loading all the policies from a client side configuration file, it 
> may be better to provide Mover with a new RPC call for getting all the 
> storage policies from the namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to