[ 
https://issues.apache.org/jira/browse/HDFS-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344876#comment-16344876
 ] 

Rakesh R commented on HDFS-13084:
---------------------------------

Thanks a lot [~daryn] for your time and review comments. My reply follows, 
please take a look at it.

*Comment-1)*
{quote}BlockManager
 Shouldn’t spsMode be volatile? Although I question why it’s here.
{quote}
[Rakesh's reply] Agreed, will do the changes.

*Comment-2)*
{quote}Adding SPS methods to this class implies an unexpected coupling of the 
SPS service to the block manager. Please move them out to prove it’s not 
tightly coupled.
{quote}
[Rakesh's reply] Agreed. I'm planning to create {{StoragePolicySatisfyManager}} 
and keep all the related apis over there.

*Comment-3)*
{quote}BPServiceActor
 Is it actually sending back the moved blocks? Aren’t IBRs sufficient?
{quote}
[Rakesh's reply] Currently, SPS have a coupling with heartbeat for sending the 
blk move tasks to DN and recieving the {{blksMovementsFinished}} list from the 
DN. Since we have grouped separately, each block movement finished block can be 
easily/quickly recognized by the Satisfier in NN side and can easily update the 
tracking details. I think, if we use IBR approach, for every newly reported 
block, Satisfier in NN side could check whether this new block has occurred due 
to SPSBlockMove operation for updating the track details. I meant, a new block 
can be reported due SPSBlockMove op or as part of some other operation like 
re-replication or newly written file block or balancer tool or Mover tool etc. 
With the current approach, it can simply get the {{blksMovementsFinished}} list 
from each heartbeat and do the logic.

*Comment-4)*
{quote}DataNode
 Why isn’t this just a block transfer? How is transferring between DNs any 
different than across storages?
{quote}
[Rakesh's reply] I could see Mover is also using {{REPLACE_BLOCK}} call and we 
just followed same approach in sps also. Am I missing anything here?

*Comment-5)*
{quote}DatanodeDescriptor
 Why use a synchronized linked list to offer/poll instead of BlockingQueue?
{quote}
[Rakesh's reply] Agreed, will do the changes.

*Comment-5)*
{quote}DatanodeManager
 I know it’s configurable, but realistically, when would you ever want to give 
storage movement tasks equal footing with under-replication? Is there really a 
use case for not valuing durability?
{quote}
[Rakesh's reply] We don't have any partcular use case, though. One scenario we 
thought is, user configured SSDs and filled up quickly. In that case, there 
could be situations that cleaning-up is considered as a high priority. If you 
feel, this is not a real case then I'm OK to remove this config and SPS will 
use only the remaining slots. If needed, we can bring such configs later.

*Comment-6)*
{quote}Adding getDatanodeStorageReport is concerning. getDatanodeListForReport 
is already a very bad method that should be avoided for anything but jmx – even 
then it’s a concern. I eliminated calls to it years ago. All it takes is a 
nscd/dns hiccup and you’re left holding the fsn lock for an excessive length of 
time. Beyond that, the response is going to be pretty large and tagging all the 
storage reports is not going to be cheap.

verifyTargetDatanodeHasSpaceForScheduling does it really need the namesystem 
lock? Can’t DatanodeDescriptor#chooseStorage4Block synchronize on its 
storageMap?

Appears to be calling getLiveDatanodeStorageReport for every file. As mentioned 
earlier, this is NOT cheap. The SPS should be able to operate on a fuzzy/cached 
state of the world. Then it gets another datanode report to determine the 
number of live nodes to decide if it should sleep before processing the next 
path. The number of nodes from the prior cached view of the world should 
suffice.
{quote}
[Rakesh's reply] Good point. Sometime back Uma and me thought about cache part. 
Actually, we depend on this api for the data node storage types and remaining 
space details. I think, it requires two different mechanisms for internal and 
external sps. For internal, how about sps can directly refer 
{{DatanodeManager#datanodeMap}} for every file. For the external, IIUC you are 
suggesting a cache mechanism. How about, make data structure {{StorageGroup}} 
similar to Mover class and build cache. This local cache can be refreshed per 
file or periodically.

*Comment-7)*
{quote}DFSUtil
 DFSUtil.removeOverlapBetweenStorageTypes and {{DFSUtil.getSPSWorkMultiplier
 }}. These aren’t generally useful methods so why are they in DFSUtil? Why 
aren’t they in the only calling class StoragePolicySatisfier?
{quote}
[Rakesh's reply] Agreed, Will do the changes.

*Comment-8)*
{quote}FSDirXAttrOp
 Not fond of the SPS queue updates being edge triggered by xattr changes. Food 
for thought: why add more rpc methods if the client can just twiddle the xattr?
{quote}
[Rakesh's reply] Sps is depending on the xattr. On each user satisfy call it 
will update the xattr and add this inodeId to {{spsPathIds}} Queue. Again, on 
restart NN, it will check the xattr key and if exists, then add the inodeId to 
{{spsPathIds}} Queue. With this flow, we decided to place the {{spsPathIds}} 
entry addition along with the xattr. Am I missing any specific case?

*Comment-9)*
{quote}FSNamesystem / NamenodeProtocolTranslatorPB
 Most of the api changes appear unnecessary.

No need for the new getFileInfo when the existing getLocatedFileInfo appears to 
do exactly what the new method does?

No need for the new getFilePath. It’s only used by 
IntraSPSNameNodeContext#getFileInfo to get the path for an inode number, 
followed by calling the new getFileInfo cited above.

IntraSPSNameNodeContext#getFileInfo swallows all IOEs, based on assumption that 
any and all IOEs means FNF which probably isn’t the intention during rpc 
exceptions.

Should be able to replace it all with 
getLocatedFileInfo(“/.reserved/.inodes/XXX”, false) which avoids changing the 
public apis.
{quote}
[Rakesh's reply] Adding few backgnd - In order to minimise the memory usage, we 
have maintained list of {{inodeIds}} rather than the path objects. So, we need 
to fetch the file details by passing the {{inodeId}} and is the reason for the 
new getFilePath(inodeId). For the getFileInfo, I've used getLocatedFileInfo API 
in {{ExternalSPSContext#getFileInfo}}. {{IntraSPSNameNodeContext}} is keeping 
Namesystem reference and getFileInfo api is not exists. Can sps keeps 
FSNamesystem ref instead of its interface ref?
 Let me do some research on this part.

*Comment-10)*
{quote}HdfsServerConstants
 The xattr is called user.hdfs.sps.xattr. Why does the xattr name actually 
contain the word “xattr”?
{quote}
[Rakesh's reply] Sure, will remove “xattr” word.

*Comment-11)*
{quote}NameNode
 Super trivial but using the plural pronoun “we” in this exception message is 
odd. Changing the value isn’t a joint activity.

For enabling or disabling storage policy satisfier, we must pass either 
none/internal/external string value only
{quote}
[Rakesh's reply] oops, sorry for the mistake. Will change it.

*Comment-12)*
{quote}StoragePolicySatisfier
 It appears to make back-to-back calls to hasLowRedundancyBlocks and 
getFileInfo for every file. Haven’t fully groked the code, but if low 
redundancy is not the common case, then it shouldn’t be called unless/until 
needed. It looks like files that are under replicated are re-queued again?
{quote}
[Rakesh's reply] Yes, under replicated files are queued up again. For external, 
I can think of another logic by iterating over all the blocks and checks the 
block location array length. If it is < replication factor, I can mark this 
file as under replicated. Does this make sense?

*Comment-13)*
{quote}Appears to be calling getStoragePolicy for every file. It’s not like the 
byte values change all the time, why call and cache all of them via 
FSN#getStoragePolicies?
{quote}
[Rakesh's reply] Below is the current logic we used. Sry, I failed to 
understand the cache part. could you please elaborate a bit. Thanks!
{code:java}
ExternalSPSContext.java
  @Override
  public BlockStoragePolicy getStoragePolicy(byte policyId) {
    return createDefaultSuite.getPolicy(policyId);
  }

IntraSPSNameNodeContext.java
    public BlockStoragePolicy getStoragePolicy(byte policyID) {
    return blockManager.getStoragePolicy(policyID);
  }
{code}

> [SPS]: Fix the branch review comments
> -------------------------------------
>
>                 Key: HDFS-13084
>                 URL: https://issues.apache.org/jira/browse/HDFS-13084
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Uma Maheswara Rao G
>            Assignee: Rakesh R
>            Priority: Major
>
> Fix the review comments provided by [~daryn]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to