[
https://issues.apache.org/jira/browse/HDFS-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890241#comment-15890241
]
Xiaoyu Yao commented on HDFS-11447:
-----------------------------------
Thanks [~anu] for the review.
bq. nit: Storage Location Report – Rename ?? – But I don't have a better
suggestion.
How about Storage Report? Keep it as-is for now.
ContainerLocationManager.java
Line 109: I see that we have todo to handle failed volume.
bq. Just a suggestion: Do you want to put the code in the getScmUsed() and
getAvailable() in a try catch so if it throws, you still can get info about the
other locations.
Good catch, fixed.
bq. We already have proper shutdown call in StateMachine, would it makes sense
to save the value in that code insted of adding a shutdown hook. It will
simpler and all shutdown code will be in one place.
Good point. I add plumbing of saveScmUsed via the datanode statemachine
shutdown. I decide to keep the JVM shutdown hook for other force shutdown cases
where datanode statemachine shutdown may not get invoked. However, if the
datanode statemachine shutdown has been invoked, the JVM hook will be a noop
due to the scmUsedSaved flag.
bq. moveStaleNodeToDead() – We update ScmStats in this function, but not in
healthyToStaleNodes. Just to make sure that I understand this clearly, does
this mean that we send traffic to stale nodes ?
No, this is just counter update on SCM side. We will not send traffic to stale
nodes.
bq. handleHeartbeat() In the earlier code monotonicNow() was used since the
recvTime could be different from when the HB processing thread actually ran. So
instead of penalizing HB wait time in the queue, we just update the last HB
time as the time we saw it. That was a conscious decision, if we start updating
the recvTime, then we also need to make sure that our response time is within a
reasonable time frame. That is we need to guarantee the HB's will get processed
within a window of time. if we update the code to use monotonicNow() we avoid
this problem of packet starvation due to queue and HB processing thread. See
the comments in line 431 about how HB thread assumes that no issues comes from
not being able to run in real time. In fact, I suggest that we use the
hbItem.getRecvTimestamp() and current time difference as a metric which lets us
know the average time we queue a HB packet. It is very useful metric to have.
Good point. I will fix that in the next patch.
bq. handleHeartbeat() – Do we need to call this function 3 times inside 3
separate if statements or can we move it outside once ?
updateNodeStat(datanodeID, nodeReport);
It is updated only if the datanodeID is contained in either healthyNodes,
staleNodes or deadNodes. We could add the following outside but I choose to
keep it separately similar to the existing update healthyNodes timestamp (3
times).
{code}
if (healthyNodes.containsKey(datanodeID) || staleNodes.containsKey(datanodeID)
|| deadNodes.containsKey(datanodeID)) {
updateNodeStat(datanodeID, nodeReport);
}
{code}
> Ozone: SCM: Send node report along with heartbeat to SCM
> --------------------------------------------------------
>
> Key: HDFS-11447
> URL: https://issues.apache.org/jira/browse/HDFS-11447
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Xiaoyu Yao
> Assignee: Xiaoyu Yao
> Attachments: HDFS-11447-HDFS-7240.001.patch
>
>
> The storage utilization information on datanode should be reported to SCM to
> help decide container allocation.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]