[ 
https://issues.apache.org/jira/browse/HDFS-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BugFinder updated HDFS-16639:
-----------------------------
    Description: 
Hi,

We have been performance/scale profiling with our in-house tools a few versions 
of HDFS (including 3.0.0 and 3.3.3) and we have noticed some places for 
possible optimizations. According to what we have seen, the method 

org.apache.hadoop.hdfs.util.LightWeightHashSet.resize

has a possibly quadratic behavior (linear at the least) which might be 
impactful depending on which data is being stored in the instance (e.g. too 
many blocks to be removed like here 
https://issues.apache.org/jira/browse/HDFS-16574). Albeit this behavior might 
be reasonable or even not noticeable in some cases, when under wide locks as in

FSNamesystem.reportBadBlocks *// Holding the write lock*
  BlockManager.findAndMarkBlockAsCorrupt
    BlockManager.markBlockAsCorrupt
     BlockManager.addToInvalidates
        InvalidateBlocks.add
          LightWeightHashSet.add
            LightWeightHashSet.expandIfNecessary
              LightWeightHashSet.resize

Could become an issue and a possible source of performance degradations.

There are several call trees that seem to end in resize and have locks, thus 
making an improvement there could uplift NN performance in many cases. Of 
course, not all of these are bad, or better said, not all of these are 
problematic in every workload. We do not have a proposal for a solution yet, as 
we are doing exploratory work with our in-house tools. We believe this issue is 
present not only in 3.0.0 and 3.3.3 but also in other versions. 

  was:
Hi,

We have been performance/scale profiling with our in-house tools a few versions 
of HDFS (including 3.0.0 and 3.3.3) and we have noticed some places for 
possible optimizations. According to what we have seen, the method 

org.apache.hadoop.hdfs.util.LightWeightHashSet.resize

has a possibly quadratic behavior (linear at the least) which might be 
impactful depending on which data is being stored in the instance (e.g. too 
many blocks to be removed like here 
https://issues.apache.org/jira/browse/HDFS-16574). Albeit this behavior might 
be reasonable or even not noticeable in some cases, when under wide locks as in

FSNamesystem.reportBadBlocks *// Holding the write lock*
  BlockManager.findAndMarkBlockAsCorrupt
    BlockManager.markBlockAsCorrupt
     BlockManager.addToInvalidates
        InvalidateBlocks.add
          LightWeightHashSet.add
            LightWeightHashSet.expandIfNecessary
              LightWeightHashSet.resize

Could become an issue and a possible source of performance degradations.

There are several call trees that seem to end in resize and have locks, thus 
making an improvement there could uplift NN performance in many cases. Of 
course, not all of these are bad, or better said, not all of these are 
problematic in every workload. We do not have a proposal for a solution yet, as 
we are doing exploratory work with our in-house tools. We believe this issue is 
present not only in 3.0.0 and 3.3.3 but also in more recent versions. 


> LightWeightHashSet.resize possibly quadratic behavior could affect performance
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-16639
>                 URL: https://issues.apache.org/jira/browse/HDFS-16639
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.0.0, 3.3.3
>            Reporter: BugFinder
>            Priority: Major
>
> Hi,
> We have been performance/scale profiling with our in-house tools a few 
> versions of HDFS (including 3.0.0 and 3.3.3) and we have noticed some places 
> for possible optimizations. According to what we have seen, the method 
> org.apache.hadoop.hdfs.util.LightWeightHashSet.resize
> has a possibly quadratic behavior (linear at the least) which might be 
> impactful depending on which data is being stored in the instance (e.g. too 
> many blocks to be removed like here 
> https://issues.apache.org/jira/browse/HDFS-16574). Albeit this behavior might 
> be reasonable or even not noticeable in some cases, when under wide locks as 
> in
> FSNamesystem.reportBadBlocks *// Holding the write lock*
>   BlockManager.findAndMarkBlockAsCorrupt
>     BlockManager.markBlockAsCorrupt
>      BlockManager.addToInvalidates
>         InvalidateBlocks.add
>           LightWeightHashSet.add
>             LightWeightHashSet.expandIfNecessary
>               LightWeightHashSet.resize
> Could become an issue and a possible source of performance degradations.
> There are several call trees that seem to end in resize and have locks, thus 
> making an improvement there could uplift NN performance in many cases. Of 
> course, not all of these are bad, or better said, not all of these are 
> problematic in every workload. We do not have a proposal for a solution yet, 
> as we are doing exploratory work with our in-house tools. We believe this 
> issue is present not only in 3.0.0 and 3.3.3 but also in other versions. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to