[
https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141690#comment-14141690
]
Andrew Wang commented on HDFS-7081:
-----------------------------------
Hey guys, this isn't a full review, but I wanted to just get this stuff out
there so we have the weekend to mull it over. This is my first time looking
through this code in detail, so some of my comments extend beyond the immediate
scope of this patch. I've tried to split it up accordingly, so we can do some
of this in a follow-on.
High-level:
* Any reason we don't also expose java APIs for setting/getting the policy in
HdfsAdmin? Need to have public classes then too.
* Does DFSAdmin have a way of getting the list of storage policies? Not sure
how you'd know what policy to set if you don't know what's available.
* Also a bit surprised that the API is string based rather than ID based.
Making it ID based would be more efficient and also allow renaming policies if
desired.
FSNamesystem:
* Need javadoc on new getter
Not related to this patch:
Dir-level storage policies:
* The handling of directory storage policies seems inconsistent with rename. If
you rename an UNSPECIFIED file, it will pick up a potentially different storage
ID from its destination. However, if it already has one set (e.g. it was
created under a dir with a set policy), it'll keep that policy. Have you
considered always setting the file's storage policy to the default? Then rename
would be consistent. The other alternative is doing away with per-file storage
policies and *only* specifying them on a directory. This also seems reasonable
to me from a management perspective.
* This seems worth documenting at the least.
* We should also consider having tooling that can do a recursive
setStoragePolicy for easier administration.
BlockStoragePolicySuite:
* In the class javadoc, "Suite" by itself doesn't mean much to me. Could we
mention "collection of storage policies"?
* The xattr name "bsp" is not very descriptive. Maybe
"hsm.block.storage.policy.id" instead? We dedupe the names in memory, so the
size of the name only really matters when it's serialized to the fsimage. I
think we could dedupe them there too with a little work.
* We also shouldn't be using the trusted namespace, since it pollutes a
namespace that's meant for use only by the root user. I'd recommend system
instead.
* BlockStoragePolicy#readBlockStorageSuite(conf) doesn't seem to be used, could
remove some other helper functions then too
* Would prefer if we didn't initialize DEFAULT_SUITE until it's needed, avoid
some static init cost. Can be straight up removed if you remove
readBlockStorageSuite(conf).
* Config keys belong in DFSConfigKeys.java
* Has some lines longer than 80 chars
BlockStoragePolicy:
* Should be using slf4j for new logging
* I'd strongly consider a builder pattern for this constructor, since it seems
likely/possible that we'll want to add more fields here.
* {{diff}}: I'm not sure what a "list difference" is, but it's not a set
difference since it doesn't handle duplicates. Update javadoc about the
expectations?
> Add new DistributedFileSystem API for getting all the existing storage
> policies
> -------------------------------------------------------------------------------
>
> Key: HDFS-7081
> URL: https://issues.apache.org/jira/browse/HDFS-7081
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: balancer, namenode
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Attachments: HDFS-7081.000.patch, HDFS-7081.001.patch,
> HDFS-7081.002.patch
>
>
> Instead of loading all the policies from a client side configuration file, it
> may be better to provide Mover with a new RPC call for getting all the
> storage policies from the namenode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)