[ 
https://issues.apache.org/jira/browse/HDFS-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890241#comment-15890241
 ] 

Xiaoyu Yao commented on HDFS-11447:
-----------------------------------

Thanks [~anu] for the review. 

bq. nit: Storage Location Report – Rename ?? – But I don't have a better 
suggestion.
How about Storage Report? Keep it as-is for now.

ContainerLocationManager.java
Line 109: I see that we have todo to handle failed volume.
bq. Just a suggestion: Do you want to put the code in the getScmUsed() and 
getAvailable() in a try catch so if it throws, you still can get info about the 
other locations.

Good catch, fixed.

bq. We already have proper shutdown call in StateMachine, would it makes sense 
to save the value in that code insted of adding a shutdown hook. It will 
simpler and all shutdown code will be in one place.

Good point. I add plumbing of saveScmUsed via the datanode statemachine 
shutdown. I decide to keep the JVM shutdown hook for other force shutdown cases 
where datanode statemachine shutdown may not get invoked. However, if the 
datanode statemachine shutdown has been invoked, the JVM hook will be a noop 
due to the scmUsedSaved flag.

bq. moveStaleNodeToDead() – We update ScmStats in this function, but not in 
healthyToStaleNodes. Just to make sure that I understand this clearly, does 
this mean that we send traffic to stale nodes ?

No, this is just counter update on SCM side. We will not send traffic to stale 
nodes. 

bq. handleHeartbeat() In the earlier code monotonicNow() was used since the 
recvTime could be different from when the HB processing thread actually ran. So 
instead of penalizing HB wait time in the queue, we just update the last HB 
time as the time we saw it. That was a conscious decision, if we start updating 
the recvTime, then we also need to make sure that our response time is within a 
reasonable time frame. That is we need to guarantee the HB's will get processed 
within a window of time. if we update the code to use monotonicNow() we avoid 
this problem of packet starvation due to queue and HB processing thread. See 
the comments in line 431 about how HB thread assumes that no issues comes from 
not being able to run in real time. In fact, I suggest that we use the 
hbItem.getRecvTimestamp() and current time difference as a metric which lets us 
know the average time we queue a HB packet. It is very useful metric to have.

Good point. I will fix that in the next patch. 

bq. handleHeartbeat() – Do we need to call this function 3 times inside 3 
separate if statements or can we move it outside once ? 
updateNodeStat(datanodeID, nodeReport);

It is updated only if the datanodeID is contained in either healthyNodes, 
staleNodes or deadNodes. We could add the following outside but I choose to 
keep it separately similar to the existing update healthyNodes timestamp (3 
times).
{code} 
if (healthyNodes.containsKey(datanodeID) || staleNodes.containsKey(datanodeID) 
|| deadNodes.containsKey(datanodeID)) {
  updateNodeStat(datanodeID, nodeReport);
}
{code}

> Ozone: SCM: Send node report along with heartbeat to SCM
> --------------------------------------------------------
>
>                 Key: HDFS-11447
>                 URL: https://issues.apache.org/jira/browse/HDFS-11447
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HDFS-11447-HDFS-7240.001.patch
>
>
> The storage utilization information on datanode should be reported to SCM to 
> help decide container allocation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to