[ https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141690#comment-14141690 ]
Andrew Wang commented on HDFS-7081: ----------------------------------- Hey guys, this isn't a full review, but I wanted to just get this stuff out there so we have the weekend to mull it over. This is my first time looking through this code in detail, so some of my comments extend beyond the immediate scope of this patch. I've tried to split it up accordingly, so we can do some of this in a follow-on. High-level: * Any reason we don't also expose java APIs for setting/getting the policy in HdfsAdmin? Need to have public classes then too. * Does DFSAdmin have a way of getting the list of storage policies? Not sure how you'd know what policy to set if you don't know what's available. * Also a bit surprised that the API is string based rather than ID based. Making it ID based would be more efficient and also allow renaming policies if desired. FSNamesystem: * Need javadoc on new getter Not related to this patch: Dir-level storage policies: * The handling of directory storage policies seems inconsistent with rename. If you rename an UNSPECIFIED file, it will pick up a potentially different storage ID from its destination. However, if it already has one set (e.g. it was created under a dir with a set policy), it'll keep that policy. Have you considered always setting the file's storage policy to the default? Then rename would be consistent. The other alternative is doing away with per-file storage policies and *only* specifying them on a directory. This also seems reasonable to me from a management perspective. * This seems worth documenting at the least. * We should also consider having tooling that can do a recursive setStoragePolicy for easier administration. BlockStoragePolicySuite: * In the class javadoc, "Suite" by itself doesn't mean much to me. Could we mention "collection of storage policies"? * The xattr name "bsp" is not very descriptive. Maybe "hsm.block.storage.policy.id" instead? We dedupe the names in memory, so the size of the name only really matters when it's serialized to the fsimage. I think we could dedupe them there too with a little work. * We also shouldn't be using the trusted namespace, since it pollutes a namespace that's meant for use only by the root user. I'd recommend system instead. * BlockStoragePolicy#readBlockStorageSuite(conf) doesn't seem to be used, could remove some other helper functions then too * Would prefer if we didn't initialize DEFAULT_SUITE until it's needed, avoid some static init cost. Can be straight up removed if you remove readBlockStorageSuite(conf). * Config keys belong in DFSConfigKeys.java * Has some lines longer than 80 chars BlockStoragePolicy: * Should be using slf4j for new logging * I'd strongly consider a builder pattern for this constructor, since it seems likely/possible that we'll want to add more fields here. * {{diff}}: I'm not sure what a "list difference" is, but it's not a set difference since it doesn't handle duplicates. Update javadoc about the expectations? > Add new DistributedFileSystem API for getting all the existing storage > policies > ------------------------------------------------------------------------------- > > Key: HDFS-7081 > URL: https://issues.apache.org/jira/browse/HDFS-7081 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, namenode > Reporter: Jing Zhao > Assignee: Jing Zhao > Attachments: HDFS-7081.000.patch, HDFS-7081.001.patch, > HDFS-7081.002.patch > > > Instead of loading all the policies from a client side configuration file, it > may be better to provide Mover with a new RPC call for getting all the > storage policies from the namenode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)