[ 
https://issues.apache.org/jira/browse/HDFS-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344688#comment-16344688
 ] 

Uma Maheswara Rao G commented on HDFS-13084:
--------------------------------------------

Comments from [~daryn] :

*BlockManager*

Shouldn’t spsMode be volatile? Although I question why it’s here.

*BPServiceActor*
Is it actually sending back the moved blocks? Aren’t IBRs sufficient?

*DataNode*
Why isn’t this just a block transfer? How is transferring between DNs any 
different than across storages?

*DatanodeDescriptor*
Why use a synchronized linked list to offer/poll instead of BlockingQueue?

*DatanodeManager*
I know it’s configurable, but realistically, when would you ever want to give 
storage movement tasks equal footing with under-replication? Is there really a 
use case for not valuing durability?

Adding {{getDatanodeStorageReport}} is concerning. {{getDatanodeListForReport}} 
is already a very bad method that should be avoided for anything but jmx – even 
then it’s a concern. I eliminated calls to it years ago. All it takes is a 
nscd/dns hiccup and you’re left holding the fsn lock for an excessive length of 
time. Beyond that, the response is going to be pretty large and tagging all the 
storage reports is not going to be cheap.

{{verifyTargetDatanodeHasSpaceForScheduling}} does it really need the 
namesystem lock? Can’t {{DatanodeDescriptor#chooseStorage4Block}} synchronize 
on its {{storageMap}}?

*DFSUtil*
{{DFSUtil.removeOverlapBetweenStorageTypes}} and {{DFSUtil.getSPSWorkMultiplier
}}. These aren’t generally useful methods so why are they in {{DFSUtil}}? Why 
aren’t they in the only calling class {{StoragePolicySatisfier}}?

*BlockManager*
Adding SPS methods to this class implies an unexpected coupling of the SPS 
service to the block manager. Please move them out to prove it’s not tightly 
coupled.

*FSDirXAttrOp*
Not fond of the SPS queue updates being edge triggered by xattr changes. Food 
for thought: why add more rpc methods if the client can just twiddle the xattr?

*FSNamesystem / NamenodeProtocolTranslatorPB*
Most of the api changes appear unnecessary.

No need for the new {{getFileInfo}} when the existing {{getLocatedFileInfo}} 
appears to do exactly what the new method does?

No need for the new {{getFilePath}}. It’s only used by 
{{IntraSPSNameNodeContext#getFileInfo}} to get the path for an inode number, 
followed by calling the new {{getFileInfo}} cited above.

{{IntraSPSNameNodeContext#getFileInfo}} swallows all IOEs, based on assumption 
that any and all IOEs means FNF which probably isn’t the intention during rpc 
exceptions.

Should be able to replace it all with 
{{getLocatedFileInfo(“/.reserved/.inodes/XXX”, false)}} which avoids changing 
the public apis.

*HdfsServerConstants*
The xattr is called {{user.hdfs.sps.xattr}}. Why does the xattr name actually 
contain the word “xattr”?

*NameNode*
Super trivial but using the plural pronoun “we” in this exception message is 
odd. Changing the value isn’t a joint activity. :)
{quote}For enabling or disabling storage policy satisfier, we must pass either 
none/internal/external string value only
{quote}
*StoragePolicySatisfier*
It appears to make back-to-back calls to {{hasLowRedundancyBlocks}} and 
{{getFileInfo}} for every file. Haven’t fully groked the code, but if low 
redundancy is not the common case, then it shouldn’t be called unless/until 
needed. It looks like files that are under replicated are re-queued again?

Appears to be calling {{getStoragePolicy}} for every file. It’s not like the 
byte values change all the time, why call and cache all of them via 
{{FSN#getStoragePolicies}}?

Appears to be calling {{getLiveDatanodeStorageReport}} for every file. As 
mentioned earlier, this is NOT cheap. The SPS should be able to operate on a 
fuzzy/cached state of the world. Then it gets another datanode report to 
determine the number of live nodes to decide if it should sleep before 
processing the next path. The number of nodes from the prior cached view of the 
world should suffice.

> [SPS]: Fix the branch review comments
> -------------------------------------
>
>                 Key: HDFS-13084
>                 URL: https://issues.apache.org/jira/browse/HDFS-13084
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Uma Maheswara Rao G
>            Assignee: Rakesh R
>            Priority: Major
>
> Fix the review comments provided by [~daryn]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to