[ 
https://issues.apache.org/jira/browse/HDFS-17627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882941#comment-17882941
 ] 

Jian Zhang commented on HDFS-17627:
-----------------------------------

[~hnzhu] Can you describe the scenario or environment and the number of 
replicas under which the {{getStaleReplicas}} operation encounters performance 
bottlenecks?

> Performance optimization on BlockUnderConstructionFeature
> ---------------------------------------------------------
>
>                 Key: HDFS-17627
>                 URL: https://issues.apache.org/jira/browse/HDFS-17627
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.3.0
>            Reporter: Hao-Nan Zhu
>            Priority: Minor
>
> Hi, I’ve encountered performance bottlenecks in 
> _blockmanagement.BlockUnderConstructionFeature_ and I wonder if there's a 
> chance for optimization.
>  
> [_getStaleReplica()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L219]
>  may cause performance degradation when the list of replicas is large. The 
> method uses an *ArrayList* to collect stale replicas, which could cause 
> memory re-allocations and potential OOM errors when the number of stale 
> replicas increases. Furthermore, 
> [_getStaleReplica()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L219]
>  could also cause lock contention at some code paths like:  
> [_updatePipelineInternal()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L6054]
>  (holding global lock) -> 
> [_updateLastBlock()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4965]
>  -> 
> [_setGenerationStampAndVerifyReplicas_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java#L426]{_}(){_}
>  -> 
> [_getStaleReplica()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L219].
>  
>  
> The optimization could be pre-sizing the ArrayList based on the actual number 
> of replicas (i.e. _List<ReplicaUnderConstruction> staleReplicas = new 
> ArrayList<>(replicas.length)_ ), which could minimize the number of times 
> resizing or reallocations. Another way to do the optimization is to have a 
> persisted list of {_}staleReplicas{_}, so there is no need to iterate over 
> the replicas.
>  
> Same issue could also happen with 
> [_appendUCPartsConcise()_|https://github.com/apache/hadoop/blob/6be04633b55bbd67c2875e39977cd9d2308dc1d1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L349].
>  It takes in a StringBuilder with a [default size of 
> 150|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java#L792]
>  characters, which leads to risks of resizing when the number of replicas is 
> large. Within {_}BlockUnderConstructionFeature{_}, there are other similar 
> issues exist, including 
> [_addReplicaIfNotPresent()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L294]
>  or 
> [_setExpectedLocations()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L74].
>  
> Please let me know if there is something wrong with the analysis above, or 
> any comments on the optimization. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to