[ https://issues.apache.org/jira/browse/HDFS-17627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882941#comment-17882941 ]
Jian Zhang commented on HDFS-17627: ----------------------------------- [~hnzhu] Can you describe the scenario or environment and the number of replicas under which the {{getStaleReplicas}} operation encounters performance bottlenecks? > Performance optimization on BlockUnderConstructionFeature > --------------------------------------------------------- > > Key: HDFS-17627 > URL: https://issues.apache.org/jira/browse/HDFS-17627 > Project: Hadoop HDFS > Issue Type: Improvement > Components: server > Affects Versions: 3.3.0 > Reporter: Hao-Nan Zhu > Priority: Minor > > Hi, I’ve encountered performance bottlenecks in > _blockmanagement.BlockUnderConstructionFeature_ and I wonder if there's a > chance for optimization. > > [_getStaleReplica()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L219] > may cause performance degradation when the list of replicas is large. The > method uses an *ArrayList* to collect stale replicas, which could cause > memory re-allocations and potential OOM errors when the number of stale > replicas increases. Furthermore, > [_getStaleReplica()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L219] > could also cause lock contention at some code paths like: > [_updatePipelineInternal()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L6054] > (holding global lock) -> > [_updateLastBlock()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L4965] > -> > [_setGenerationStampAndVerifyReplicas_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java#L426]{_}(){_} > -> > [_getStaleReplica()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L219]. > > > The optimization could be pre-sizing the ArrayList based on the actual number > of replicas (i.e. _List<ReplicaUnderConstruction> staleReplicas = new > ArrayList<>(replicas.length)_ ), which could minimize the number of times > resizing or reallocations. Another way to do the optimization is to have a > persisted list of {_}staleReplicas{_}, so there is no need to iterate over > the replicas. > > Same issue could also happen with > [_appendUCPartsConcise()_|https://github.com/apache/hadoop/blob/6be04633b55bbd67c2875e39977cd9d2308dc1d1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L349]. > It takes in a StringBuilder with a [default size of > 150|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java#L792] > characters, which leads to risks of resizing when the number of replicas is > large. Within {_}BlockUnderConstructionFeature{_}, there are other similar > issues exist, including > [_addReplicaIfNotPresent()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L294] > or > [_setExpectedLocations()_|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockUnderConstructionFeature.java#L74]. > > Please let me know if there is something wrong with the analysis above, or > any comments on the optimization. Thanks! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org