[ https://issues.apache.org/jira/browse/HDFS-17598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hao-Nan Zhu updated HDFS-17598: ------------------------------- Description: Hello, I wonder if there are chances to optimize a little bit for the {_}DatanodeManager{_}, for its performance when the number of _datanodes_ is large * [_fetchDatanodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1513] calls [_removeDecomNodeFromList_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1131] for both lists for live and dead datanodes. _removeDecomNodeFromList_ will have to iterate all datanodes in the list. This can be optimized by checking whether the node is decommissioned using _node.isDecommissioned()_ before adding the node to the lists of live and dead datanodes. * [_getNumLiveDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1404] iterates over all datanodes. However, [_getNumDeadDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1417] gets the size in a different (presumably more efficient) way. Is there a reason that _getNumLiveDataNodes_ has to iterate all over the {_}datanodeMap{_}? Can we use the same way for _getNumLiveDataNodes?_ And similar observations for [_resetLastCachingDirectiveSentTime_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L2097] and [_getDatanodeListForReport_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1616]. It seems optimizing these methods can contribute to more performant checks, especially when the number of datanodes is larger. Are there any plans on having these types of large-scale (micro) optimizations? Please let me know if I need to provide more information. Thanks! was: Hello, I wonder if there are chances to optimize a little bit for the {_}DatanodeManager{_}, for its performance when the number of _datanodes_ is large * [_fetchDatanodes_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1144] calls [_removeDecomNodeFromList_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L817] for both lists for live and dead datanodes. _removeDecomNodeFromList_ will have to iterate all datanodes in the list. This can be optimized by checking whether the node is decommissioned using _node.isDecommissioned()_ before adding the node to the lists of live and dead datanodes. * [_getNumLiveDataNodes_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1055] iterates over all datanodes. However, [_getNumDeadDataNodes_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1068] gets the size in a different (presumably more efficient) way. Is there a reason that _getNumLiveDataNodes_ has to iterate all over the {_}datanodeMap{_}? Can we use the same way for _getNumLiveDataNodes?_ And similar observations for [_resetLastCachingDirectiveSentTime_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1560] and [_getDatanodeListForReport_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1253]. It seems optimizing these methods can contribute to more performant checks, especially when the number of datanodes is larger. Are there any plans on having these types of large-scale (micro) optimizations? Please let me know if I need to provide more information. Thanks! > Optimizations for DatanodeManager for large-scale cases > ------------------------------------------------------- > > Key: HDFS-17598 > URL: https://issues.apache.org/jira/browse/HDFS-17598 > Project: Hadoop HDFS > Issue Type: Improvement > Components: performance > Affects Versions: 3.4.0 > Reporter: Hao-Nan Zhu > Priority: Minor > > Hello, > > I wonder if there are chances to optimize a little bit for the > {_}DatanodeManager{_}, for its performance when the number of _datanodes_ is > large > * > [_fetchDatanodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1513] > calls > [_removeDecomNodeFromList_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1131] > for both lists for live and dead datanodes. _removeDecomNodeFromList_ will > have to iterate all datanodes in the list. This can be optimized by checking > whether the node is decommissioned using _node.isDecommissioned()_ before > adding the node to the lists of live and dead datanodes. > * > [_getNumLiveDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1404] > iterates over all datanodes. However, > [_getNumDeadDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1417] > gets the size in a different (presumably more efficient) way. Is there a > reason that _getNumLiveDataNodes_ has to iterate all over the > {_}datanodeMap{_}? Can we use the same way for _getNumLiveDataNodes?_ > And similar observations for > [_resetLastCachingDirectiveSentTime_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L2097] > and > [_getDatanodeListForReport_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1616]. > It seems optimizing these methods can contribute to more performant checks, > especially when the number of datanodes is larger. Are there any plans on > having these types of large-scale (micro) optimizations? > > Please let me know if I need to provide more information. Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org