[GitHub] [ozone] sodonnel commented on a diff in pull request #3329: HDDS-6567. Store datanode command queue counts from heartbeat in DatanodeInfo in SCM

GitBox Fri, 29 Apr 2022 02:01:53 -0700


sodonnel commented on code in PR #3329:
URL: https://github.com/apache/ozone/pull/3329#discussion_r861612693



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeInfo.java:
##########
@@ -49,6 +57,7 @@ public class DatanodeInfo extends DatanodeDetails {
   private List<StorageReportProto> storageReports;
   private List<MetadataStorageReportProto> metadataStorageReports;
   private LayoutVersionProto lastKnownLayoutVersion;

Review Comment:
   NodeStateManager.checkNodesHealth is what notices the lost heartbeats and 
triggers events based on that.
   
   The DeadNodeHandler is triggered when the node goes dead (there is also a 
StaleNodeHandler), and clears out its pipelines etc. Perhaps we should reset 
the command counts when this happens, or perhaps it is valid to leave them as 
the last known value. The datanodeInfo object is not removed AFAIK, as it holds 
the DN service state (in_service, decommissioning, healthy, stale, dead etc).  
If the DN comes back, it will be reset by the heartbeat processing. If it never 
comes back, the datanodedetails and datanodeinfo stick around in SCM until it 
is restarted.
   
   I am not sure if the command counts remaining is a big issue, as we should 
avoid scheduling commands on dead (and maybe stale) nodes anyway. Eg before 
scheduling a command for a node, need to check it is HEALTHY, as otherwise the 
commands will be queued in SCM and never taken by a DN. If something in SCM 
keeps scheduling commands for dead nodes, it will slowly fill up the SCM memory 
on the command queue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] sodonnel commented on a diff in pull request #3329: HDDS-6567. Store datanode command queue counts from heartbeat in DatanodeInfo in SCM

Reply via email to