[jira] [Updated] (HDFS-17598) Optimizations for DatanodeManager for large-scale cases

Hao-Nan Zhu (Jira) Wed, 21 Aug 2024 18:14:18 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-17598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hao-Nan Zhu updated HDFS-17598:
-------------------------------
    Description: 
Hello,

 

I wonder if there are chances to optimize a little bit for the 
{_}DatanodeManager{_}, for its performance when the number of _datanodes_ is 
large
 * 
[_fetchDatanodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1513]
 calls 
[_removeDecomNodeFromList_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1131]
 for both lists for live and dead datanodes. _removeDecomNodeFromList_ will 
have to iterate all datanodes in the list. This can be optimized by checking 
whether the node is decommissioned using _node.isDecommissioned()_ before 
adding the node to the lists of live and dead datanodes. 
 * 
[_getNumLiveDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1404]
 iterates over all datanodes. However, 
[_getNumDeadDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1417]
 gets the size in a different (presumably more efficient) way. Is there a 
reason that _getNumLiveDataNodes_ has to iterate all over the 
{_}datanodeMap{_}? Can we use the same way for _getNumLiveDataNodes?_ 

And similar observations for 
[_resetLastCachingDirectiveSentTime_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L2097]
 and 
[_getDatanodeListForReport_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1616].
 It seems optimizing these methods can contribute to more performant checks, 
especially when the number of datanodes is larger. Are there any plans on 
having these types of large-scale (micro) optimizations?

 

Please let me know if I need to provide more information. Thanks!

  was:
Hello,

 

I wonder if there are chances to optimize a little bit for the 
{_}DatanodeManager{_}, for its performance when the number of _datanodes_ is 
large
 * 
[_fetchDatanodes_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1144]
 calls 
[_removeDecomNodeFromList_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L817]
 for both lists for live and dead datanodes. _removeDecomNodeFromList_ will 
have to iterate all datanodes in the list. This can be optimized by checking 
whether the node is decommissioned using _node.isDecommissioned()_ before 
adding the node to the lists of live and dead datanodes. 
 * 
[_getNumLiveDataNodes_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1055]
 iterates over all datanodes. However, 
[_getNumDeadDataNodes_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1068]
 gets the size in a different (presumably more efficient) way. Is there a 
reason that _getNumLiveDataNodes_ has to iterate all over the 
{_}datanodeMap{_}? Can we use the same way for _getNumLiveDataNodes?_ 


And similar observations for 
[_resetLastCachingDirectiveSentTime_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1560]
 and 
[_getDatanodeListForReport_|https://github.com/naver/hadoop/blob/0c0a80f96283b5a7be234663e815bc04bafc8be2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1253].
 It seems optimizing these methods can contribute to more performant checks, 
especially when the number of datanodes is larger. Are there any plans on 
having these types of large-scale (micro) optimizations?

 

Please let me know if I need to provide more information. Thanks!


> Optimizations for DatanodeManager for large-scale cases
> -------------------------------------------------------
>
>                 Key: HDFS-17598
>                 URL: https://issues.apache.org/jira/browse/HDFS-17598
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 3.4.0
>            Reporter: Hao-Nan Zhu
>            Priority: Minor
>
> Hello,
>  
> I wonder if there are chances to optimize a little bit for the 
> {_}DatanodeManager{_}, for its performance when the number of _datanodes_ is 
> large
>  * 
> [_fetchDatanodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1513]
>  calls 
> [_removeDecomNodeFromList_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1131]
>  for both lists for live and dead datanodes. _removeDecomNodeFromList_ will 
> have to iterate all datanodes in the list. This can be optimized by checking 
> whether the node is decommissioned using _node.isDecommissioned()_ before 
> adding the node to the lists of live and dead datanodes. 
>  * 
> [_getNumLiveDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1404]
>  iterates over all datanodes. However, 
> [_getNumDeadDataNodes_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1417]
>  gets the size in a different (presumably more efficient) way. Is there a 
> reason that _getNumLiveDataNodes_ has to iterate all over the 
> {_}datanodeMap{_}? Can we use the same way for _getNumLiveDataNodes?_ 
> And similar observations for 
> [_resetLastCachingDirectiveSentTime_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L2097]
>  and 
> [_getDatanodeListForReport_|https://github.com/apache/hadoop/blob/f6c45e0bcf4aeaba31515e548dcc98b33245fe0e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L1616].
>  It seems optimizing these methods can contribute to more performant checks, 
> especially when the number of datanodes is larger. Are there any plans on 
> having these types of large-scale (micro) optimizations?
>  
> Please let me know if I need to provide more information. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17598) Optimizations for DatanodeManager for large-scale cases

Reply via email to