[ 
https://issues.apache.org/jira/browse/HDFS-16639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557512#comment-17557512
 ] 

BugFinder commented on HDFS-16639:
----------------------------------

Looking at my in-house tool logs and doing our best to correlate to some 
possible cases, we have seen that this operation *might be a contributing 
factor* (just a factor, since we cannot claim that the whole problem is due to 
this) in the following cases (all of these are call trees that start taking 
some wide lock and end in this resize operation. I believe that this operation 
was a contributing factor to the associated issues, {*}maybe not the main one 
but adding a few seconds to something{*}):

"Ops" paths:

method (lock) (possibly related report)(status)
 * org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setReplication (write 
lock) (HDFS-16070) (open)
 * org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete (write lock) 
(HDFS-13831) (resolved)
 * org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile (write lock) 
(HDFS-14366) (resolved)
 * org.apache.hadoop.hdfs.server.namenode.FSDirConcatOp.concat (write lock) 
(None) (none)
 * and in general whatever Op that goes thorugh BlockManager.setReplication, 
BlockManager.removeBlockFromMap, InvalidateBlocks.remove or 
BlockManager.removeStaleReplicas

Other non "Ops" paths include
 * org.apache.hadoop.hdfs.server.namenode.FSNamesystem.handleHeartbeat (read 
lock) (HDFS-16613) (resolved)
 * org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode (write 
lock) (HDFS-14186) (reopened)
 * org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits (write 
lock) (None) (none)

> LightWeightHashSet.resize possibly quadratic behavior could affect performance
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-16639
>                 URL: https://issues.apache.org/jira/browse/HDFS-16639
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.0.0, 3.3.3
>            Reporter: BugFinder
>            Priority: Major
>
> Hi,
> We have been performance/scale profiling with our in-house tools a few 
> versions of HDFS (including 3.0.0 and 3.3.3) and we have noticed some places 
> for possible optimizations. According to what we have seen, the method 
> org.apache.hadoop.hdfs.util.LightWeightHashSet.resize
> has a possibly quadratic behavior (linear at the least) which might be 
> impactful depending on which data is being stored in the instance (e.g. too 
> many blocks to be removed like here 
> https://issues.apache.org/jira/browse/HDFS-16574). Albeit this behavior might 
> be reasonable or even not noticeable in some cases, when under wide locks as 
> in
> FSNamesystem.reportBadBlocks *// Holding the write lock*
>   BlockManager.findAndMarkBlockAsCorrupt
>     BlockManager.markBlockAsCorrupt
>      BlockManager.addToInvalidates
>         InvalidateBlocks.add
>           LightWeightHashSet.add
>             LightWeightHashSet.expandIfNecessary
>               LightWeightHashSet.resize
> Could become an issue and a possible source of performance degradations.
> There are several call trees that seem to end in resize and have locks, thus 
> making an improvement there could uplift NN performance in many cases. Of 
> course, not all of these are bad, or better said, not all of these are 
> problematic in every workload. We do not have a proposal for a solution yet, 
> as we are doing exploratory work with our in-house tools. We believe this 
> issue is present not only in 3.0.0 and 3.3.3 but also in other versions. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to