sodonnel commented on code in PR #6558:
URL: https://github.com/apache/ozone/pull/6558#discussion_r1583142984
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeManager.java:
##########
@@ -119,6 +122,24 @@ default void
registerSendCommandNotify(SCMCommandProto.Type type,
List<DatanodeDetails> getNodes(
NodeOperationalState opState, NodeState health);
+ /**
+ * Gets all Live Datanodes that is currently communicating with SCM.
+ * The result is always not null.
+ * @param opStates - The operational states of the node
+ * @param health - The health of the node
+ * @return List of Datanodes that are Heartbeating SCM.
+ */
+ default List<DatanodeDetails> getNodes(
Review Comment:
This new method does not address the performance concern I had. It is
basically calling the original getNodes() method for each of the 4 out of
service states. Each of those calls has to iterate all nodes in the cluster and
return a set of the ones which are out of service.
The nodes picked by the policy have to be checked again before they are
returned. While I originally suggested this solution, I am not sure it is a
good one. It may be better to look at the retry count, and allow more retries
if the failure reason is that the node is not in service. Or have a larger
retry count if the cluster is large etc. At least then, the common case, which
is no nodes out of service, does not pay a performance penalty on every call.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]