[
https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-7081:
----------------------------
Attachment: HDFS-7081.003.patch
Thanks for the comments, Andrew. Upload a new patch to address your comments.
Since some of your comments on BlockStoragePolicySuite/BlockStoragePolicy are
easy to address, I just include them in this patch. Maybe we can create a
separate jira to address the remaining comments.
bq. Any reason we don't also expose java APIs for setting/getting the policy in
HdfsAdmin? Need to have public classes then too.
Initially we planned to allow normal users to set storage policy in the next
step of heterogeneous storage. Currently I just follow your comments
and add the setting policy API into HdfsAdmin. We can remove it in the future
if necessary. (For getting the policy it does not require superuser thus we do
not need to add it in HdfsAdmin).
bq. Does DFSAdmin have a way of getting the list of storage policies? Not sure
how you'd know what policy to set if you don't know what's available.
Getting the list of storage policies does not require superuser permission thus
actually we should add a separate tool for this (similar with
listSnapshottableDir which also allows a normal user to list snapshottble dirs
belong to him/her). I plan to add this in a separate jira since the current
patch is already big.
bq. Also a bit surprised that the API is string based rather than ID based.
Making it ID based would be more efficient and also allow renaming policies if
desired.
I guess you're referring to {{setStoragePolicy}} here? Currently internally
(inside of NameNode) we attach ID to files/directories (thus a policy can be
easily renamed), and we use String for end users to specify the policy, where
I think to use the policy name is much more user-friendly. And since it is not
common to set storage policy for a large amount of files/directories and the
name is usually short, the efficiency may not be an issue here.
bq. FSNamesystem: Need javadoc on new getter
Done. Thanks for pointing this out.
bq. The handling of directory storage policies seems inconsistent with rename.
If you rename an UNSPECIFIED file, it will pick up a potentially different
storage ID from its destination. However, if it already has one set (e.g. it
was created under a dir with a set policy), it'll keep that policy. Have you
considered always setting the file's storage policy to the default? Then rename
would be consistent. The other alternative is doing away with per-file storage
policies and only specifying them on a directory. This also seems reasonable to
me from a management perspective.
bq. This seems worth documenting at the least.
I think to have policy on both files/directories is necessary since in the
future we can have tools/services keeping scanning the namespace and marking
files/directories based on their access/modification time etc. Currently we do
not keep the UNSPECIFIED policy mainly because we want to allow users to set a
cold/warm directory and then to change data's temperature by moving
files/directories into it. We will add more details into the document.
bq. We should also consider having tooling that can do a recursive
setStoragePolicy for easier administration.
If we can set storage policy directly on a directory, why do we still need to do
it recursively? But to provide a tool for easier administration (not just for
setting storage policy) is always good.
bq. BlockStoragePolicySuite: In the class javadoc, "Suite" by itself doesn't
mean much to me. Could we mention "collection of storage policies"?
Done.
bq. The xattr name "bsp" is not very descriptive. Maybe
"hsm.block.storage.policy.id" instead? We dedupe the names in memory, so the
size of the name only really matters when it's serialized to the fsimage. I
think we could dedupe them there too with a little work.
Done. This is a very good suggestion.
bq. We also shouldn't be using the trusted namespace, since it pollutes a
namespace that's meant for use only by the root user. I'd recommend system
instead.
For this one I have a question. According to the current document "TRUSTED
namespace attributes are only visible and accessible to privileged users."
Currently the storage policy is actually set by superuser and in HDFS we do not
have root user. So does that mean we should use trusted here?
bq. BlockStoragePolicy#readBlockStorageSuite(conf) doesn't seem to be used,
could remove some other helper functions then too
bq. Would prefer if we didn't initialize DEFAULT_SUITE until it's needed, avoid
some static init cost. Can be straight up removed if you remove
readBlockStorageSuite(conf).
Actually I kept this part of code because I planed to reuse/move this part of
code in DFSAdmin for admin to set new policies based on configuration files.
Since this is out of the scope of this jira currently I will just remove all
the code.
bq. Has some lines longer than 80 chars
Done.
bq. BlockStoragePolicy: Should be using slf4j for new logging
Done for BlockStoragePolicy/BlockStoragePolicySuite.
bq. I'd strongly consider a builder pattern for this constructor, since it
seems likely/possible that we'll want to add more fields here.
Currently the BlockStoragePolicy has only one constructor. All its 5 fields are
final and must be specified without default values. So right now I do not see
the necessity of a builder here, but we can add it in the future if necessary.
bq. diff: I'm not sure what a "list difference" is, but it's not a set
difference since it doesn't handle duplicates. Update javadoc about the
expectations?
Done.
> Add new DistributedFileSystem API for getting all the existing storage
> policies
> -------------------------------------------------------------------------------
>
> Key: HDFS-7081
> URL: https://issues.apache.org/jira/browse/HDFS-7081
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: balancer, namenode
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Attachments: HDFS-7081.000.patch, HDFS-7081.001.patch,
> HDFS-7081.002.patch, HDFS-7081.003.patch
>
>
> Instead of loading all the policies from a client side configuration file, it
> may be better to provide Mover with a new RPC call for getting all the
> storage policies from the namenode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)