[ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344176#comment-16344176 ]
Daryn Sharp commented on HDFS-10285: ------------------------------------ Still working my way through it. Here's comments/questions so far. *BlockManager* Shouldn’t spsMode be volatile? Although I question why it’s here. *BPServiceActor* Is it actually sending back the moved blocks? Aren’t IBRs sufficient? *DataNode* Why isn’t this just a block transfer? How is transferring between DNs any different than across storages? *DatanodeDescriptor* Why use a synchronized linked list to offer/poll instead of BlockingQueue? *DatanodeManager* I know it’s configurable, but realistically, when would you ever want to give storage movement tasks equal footing with under-replication? Is there really a use case for not valuing durability? Adding {{getDatanodeStorageReport}} is concerning. {{getDatanodeListForReport}} is already a very bad method that should be avoided for anything but jmx – even then it’s a concern. I eliminated calls to it years ago. All it takes is a nscd/dns hiccup and you’re left holding the fsn lock for an excessive length of time. Beyond that, the response is going to be pretty large and tagging all the storage reports is not going to be cheap. {{verifyTargetDatanodeHasSpaceForScheduling}} does it really need the namesystem lock? Can’t {{DatanodeDescriptor#chooseStorage4Block}} synchronize on its {{storageMap}}? *DFSUtil* {{DFSUtil.removeOverlapBetweenStorageTypes}} and {{DFSUtil.getSPSWorkMultiplier }}. These aren’t generally useful methods so why are they in {{DFSUtil}}? Why aren’t they in the only calling class {{StoragePolicySatisfier}}? *BlockManager* Adding SPS methods to this class implies an unexpected coupling of the SPS service to the block manager. Please move them out to prove it’s not tightly coupled. *FSDirXAttrOp* Not fond of the SPS queue updates being edge triggered by xattr changes. Food for thought: why add more rpc methods if the client can just twiddle the xattr? *FSNamesystem / NamenodeProtocolTranslatorPB* Most of the api changes appear unnecessary. No need for the new {{getFileInfo}} when the existing {{getLocatedFileInfo}} appears to do exactly what the new method does? No need for the new {{getFilePath}}. It’s only used by {{IntraSPSNameNodeContext#getFileInfo}} to get the path for an inode number, followed by calling the new {{getFileInfo}} cited above. {{IntraSPSNameNodeContext#getFileInfo}} swallows all IOEs, based on assumption that any and all IOEs means FNF which probably isn’t the intention during rpc exceptions. Should be able to replace it all with {{getLocatedFileInfo(“/.reserved/.inodes/XXX”, false)}} which avoids changing the public apis. *HdfsServerConstants* The xattr is called {{user.hdfs.sps.xattr}}. Why does the xattr name actually contain the word “xattr”? *NameNode* Super trivial but using the plural pronoun “we” in this exception message is odd. Changing the value isn’t a joint activity. :) bq. For enabling or disabling storage policy satisfier, we must pass either none/internal/external string value only *StoragePolicySatisfier* It appears to make back-to-back calls to {{hasLowRedundancyBlocks}} and {{getFileInfo}} for every file. Haven’t fully groked the code, but if low redundancy is not the common case, then it shouldn’t be called unless/until needed. It looks like files that are under replicated are re-queued again? Appears to be calling {{getStoragePolicy}} for every file. It’s not like the byte values change all the time, why call and cache all of them via {{FSN#getStoragePolicies}}? Appears to be calling {{getLiveDatanodeStorageReport}} for every file. As mentioned earlier, this is NOT cheap. The SPS should be able to operate on a fuzzy/cached state of the world. Then it gets another datanode report to determine the number of live nodes to decide if it should sleep before processing the next path. The number of nodes from the prior cached view of the world should suffice. > Storage Policy Satisfier in Namenode > ------------------------------------ > > Key: HDFS-10285 > URL: https://issues.apache.org/jira/browse/HDFS-10285 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Affects Versions: HDFS-10285 > Reporter: Uma Maheswara Rao G > Assignee: Uma Maheswara Rao G > Priority: Major > Attachments: HDFS-10285-consolidated-merge-patch-00.patch, > HDFS-10285-consolidated-merge-patch-01.patch, > HDFS-10285-consolidated-merge-patch-02.patch, > HDFS-10285-consolidated-merge-patch-03.patch, > HDFS-10285-consolidated-merge-patch-04.patch, > HDFS-10285-consolidated-merge-patch-05.patch, > HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, > Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, > Storage-Policy-Satisfier-in-HDFS-May10.pdf, > Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf > > > Heterogeneous storage in HDFS introduced the concept of storage policy. These > policies can be set on directory/file to specify the user preference, where > to store the physical block. When user set the storage policy before writing > data, then the blocks could take advantage of storage policy preferences and > stores physical block accordingly. > If user set the storage policy after writing and completing the file, then > the blocks would have been written with default storage policy (nothing but > DISK). User has to run the ‘Mover tool’ explicitly by specifying all such > file names as a list. In some distributed system scenarios (ex: HBase) it > would be difficult to collect all the files and run the tool as different > nodes can write files separately and file can have different paths. > Another scenarios is, when user rename the files from one effected storage > policy file (inherited policy from parent directory) to another storage > policy effected directory, it will not copy inherited storage policy from > source. So it will take effect from destination file/dir parent storage > policy. This rename operation is just a metadata change in Namenode. The > physical blocks still remain with source storage policy. > So, Tracking all such business logic based file names could be difficult for > admins from distributed nodes(ex: region servers) and running the Mover tool. > Here the proposal is to provide an API from Namenode itself for trigger the > storage policy satisfaction. A Daemon thread inside Namenode should track > such calls and process to DN as movement commands. > Will post the detailed design thoughts document soon. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org