[ 
https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344176#comment-16344176
 ] 

Daryn Sharp commented on HDFS-10285:
------------------------------------

Still working my way through it.  Here's comments/questions so far.

*BlockManager*
Shouldn’t spsMode be volatile?  Although I question why it’s here.

*BPServiceActor*
Is it actually sending back the moved blocks?  Aren’t IBRs sufficient?

*DataNode*
Why isn’t this just a block transfer?  How is transferring between DNs any 
different than across storages?

*DatanodeDescriptor*
Why use a synchronized linked list to offer/poll instead of BlockingQueue?

*DatanodeManager*
I know it’s configurable, but realistically, when would you ever want to give 
storage movement tasks equal footing with under-replication?  Is there really a 
use case for not valuing durability?

Adding {{getDatanodeStorageReport}} is concerning.  
{{getDatanodeListForReport}} is already a very bad method that should be 
avoided for anything but jmx – even then it’s a concern.  I eliminated calls to 
it years ago.  All it takes is a nscd/dns hiccup and you’re left holding the 
fsn lock for an excessive length of time.  Beyond that, the response is going 
to be pretty large and tagging all the storage reports is not going to be cheap.

{{verifyTargetDatanodeHasSpaceForScheduling}} does it really need the 
namesystem lock?  Can’t {{DatanodeDescriptor#chooseStorage4Block}} synchronize 
on its {{storageMap}}?

*DFSUtil*
{{DFSUtil.removeOverlapBetweenStorageTypes}} and {{DFSUtil.getSPSWorkMultiplier
}}.  These aren’t generally useful methods so why are they in {{DFSUtil}}?  Why 
aren’t they in the only calling class {{StoragePolicySatisfier}}?

*BlockManager*
Adding SPS methods to this class implies an unexpected coupling of the SPS 
service to the block manager.  Please move them out to prove it’s not tightly 
coupled.

*FSDirXAttrOp*
Not fond of the SPS queue updates being edge triggered by xattr changes.  Food 
for thought: why add more rpc methods if the client can just twiddle the xattr?

*FSNamesystem / NamenodeProtocolTranslatorPB*
Most of the api changes appear unnecessary.

No need for the new {{getFileInfo}} when the existing {{getLocatedFileInfo}} 
appears to do exactly what the new method does?

No need for the new {{getFilePath}}.  It’s only used by 
{{IntraSPSNameNodeContext#getFileInfo}} to get the path for an inode number, 
followed by calling the new {{getFileInfo}} cited above.

{{IntraSPSNameNodeContext#getFileInfo}} swallows all IOEs, based on assumption 
that any and all IOEs means FNF which probably isn’t the intention during rpc 
exceptions.

Should be able to replace it all with 
{{getLocatedFileInfo(“/.reserved/.inodes/XXX”, false)}} which avoids changing 
the public apis.

*HdfsServerConstants*
The xattr is called {{user.hdfs.sps.xattr}}.  Why does the xattr name actually 
contain the word “xattr”?

*NameNode*
Super trivial but using the plural pronoun “we” in this exception message is 
odd.  Changing the value isn’t a joint activity. :)
bq. For enabling or disabling storage policy satisfier, we must pass either 
none/internal/external string value only

*StoragePolicySatisfier*
It appears to make back-to-back calls to {{hasLowRedundancyBlocks}} and 
{{getFileInfo}} for every file.  Haven’t fully groked the code, but if low 
redundancy is not the common case, then it shouldn’t be called unless/until 
needed.  It looks like files that are under replicated are re-queued again?

Appears to be calling {{getStoragePolicy}} for every file.  It’s not like the 
byte values change all the time, why call and cache all of them via 
{{FSN#getStoragePolicies}}?

Appears to be calling {{getLiveDatanodeStorageReport}} for every file.  As 
mentioned earlier, this is NOT cheap.  The SPS should be able to operate on a 
fuzzy/cached state of the world.  Then it gets another datanode report to 
determine the number of live nodes to decide if it should sleep before 
processing the next path.  The number of nodes from the prior cached view of 
the world should suffice.


> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>            Priority: Major
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, 
> HDFS-10285-consolidated-merge-patch-01.patch, 
> HDFS-10285-consolidated-merge-patch-02.patch, 
> HDFS-10285-consolidated-merge-patch-03.patch, 
> HDFS-10285-consolidated-merge-patch-04.patch, 
> HDFS-10285-consolidated-merge-patch-05.patch, 
> HDFS-SPS-TestReport-20170708.pdf, SPS Modularization.pdf, 
> Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, 
> Storage-Policy-Satisfier-in-HDFS-May10.pdf, 
> Storage-Policy-Satisfier-in-HDFS-Oct-26-2017.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These 
> policies can be set on directory/file to specify the user preference, where 
> to store the physical block. When user set the storage policy before writing 
> data, then the blocks could take advantage of storage policy preferences and 
> stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then 
> the blocks would have been written with default storage policy (nothing but 
> DISK). User has to run the ‘Mover tool’ explicitly by specifying all such 
> file names as a list. In some distributed system scenarios (ex: HBase) it 
> would be difficult to collect all the files and run the tool as different 
> nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage 
> policy file (inherited policy from parent directory) to another storage 
> policy effected directory, it will not copy inherited storage policy from 
> source. So it will take effect from destination file/dir parent storage 
> policy. This rename operation is just a metadata change in Namenode. The 
> physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for 
> admins from distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the 
> storage policy satisfaction. A Daemon thread inside Namenode should track 
> such calls and process to DN as movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to