[ https://issues.apache.org/jira/browse/HDFS-17638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Goodness Ayinmode updated HDFS-17638: ------------------------------------- Description: Hi, I was looking into the DatanodeStorageInfo class and I think some of the methods could give issues at large scale. For example, to convert DatanodeStorageInfo objects into their respective DatanodeDescriptor and Storage ID forms, [DatanodeStorageInfo.toDatanodeInfos()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L44] and [DatanodeStorageInfo.toStorageIDs()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L61] iterate over the entire array of storage nodes. Each operation is linear, however performance issues can arise, when they are called under a lock, like in [bumpBlockGenerationStamp|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L5987], where [newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5461] calls both methods (bumpBlockGenerationStamp->newLocatedBlock->([newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5437] or [newLocatedStripedBlock |https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5435]-> toDatanodeInfos and toStorageIDs under the writeLock). This situation can be even more problematic when these methods are repeatedly invoked within an iteration like in [createLocatedBlockList|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1450] ([createLocatedBlocks -> createLocatedBlockList|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1601] ->[createLocatedBlock|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1487] ->(newLocatedBlock or newLocatedStripedBlock) -> toDatanodeInfos and toStorageIDs). Such behaviors cause significant synchronization bottlenecks when the number of blocks or number of storage nodes is large. [BlockPlacementPolicyDefault.getPipeline|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1147], [BlockPlacementPolicyDefault.chooseTarget|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L287], and [BlockManager.validateReconstructionWork|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2355] ( [BlockManager.computeReconstructionWorkForBlocks|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2187] --> BlockManager.validateReconstructionWork --> [incrementBlocksScheduled|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L338] ) also faces a similar issue with lock contention. Please let me know if my analysis is wrong, and if there are suggestions to make this better. Thanks was: Hi, I was looking into the DatanodeStorageInfo class and I think some of the methods could give issues at large scale. For example, to convert DatanodeStorageInfo objects into their respective DatanodeDescriptor and Storage ID forms, [DatanodeStorageInfo.toDatanodeInfos()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L44] and [DatanodeStorageInfo.toStorageIDs()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L61] iterate over the entire array of storage nodes. Each operation is linear, however performance issues can arise, when they are called under a lock, like in [bumpBlockGenerationStamp|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L5987], where [newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5461] calls both methods (bumpBlockGenerationStamp->newLocatedBlock->([newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5437] or [newLocatedStripedBlock |https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5435]{-})-{-}>toDatanodeInfos and toStorageIDs under the writeLock). This situation can be even more problematic when these methods are repeatedly invoked within an iteration like in [createLocatedBlockList|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1450] ([createLocatedBlocks{-}>{-} createLocatedBlockList|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1601] ->[createLocatedBlock|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1487] ->(newLocatedBlock or newLocatedStripedBlock) -> toDatanodeInfos and toStorageIDs). Such behaviors cause significant synchronization bottlenecks when the number of blocks or number of storage nodes is large. [BlockPlacementPolicyDefault.getPipeline|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1147], [BlockPlacementPolicyDefault.chooseTarget|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L287], and [BlockManager.validateReconstructionWork|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2355] ( [BlockManager.computeReconstructionWorkForBlocks|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2187] --> BlockManager.validateReconstructionWork --> [incrementBlocksScheduled|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L338] ) also faces a similar issue with lock contention. Please let me know if my analysis is wrong, and if there are suggestions to make this better. Thanks > Lock contention for DatanodeStorageInfo when the number of storage nodes is > large > --------------------------------------------------------------------------------- > > Key: HDFS-17638 > URL: https://issues.apache.org/jira/browse/HDFS-17638 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, server > Affects Versions: 3.4.0 > Reporter: Goodness Ayinmode > Priority: Minor > Labels: None > > Hi, > I was looking into the DatanodeStorageInfo class and I think some of the > methods could give issues at large scale. For example, to convert > DatanodeStorageInfo objects into their respective DatanodeDescriptor and > Storage ID forms, > [DatanodeStorageInfo.toDatanodeInfos()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L44] > and > [DatanodeStorageInfo.toStorageIDs()|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L61] > iterate over the entire array of storage nodes. Each operation is linear, > however performance issues can arise, when they are called under a lock, like > in > [bumpBlockGenerationStamp|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L5987], > where > [newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5461] > calls both methods > (bumpBlockGenerationStamp->newLocatedBlock->([newLocatedBlock|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5437] > or [newLocatedStripedBlock > |https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5435]-> > toDatanodeInfos and toStorageIDs under the writeLock). This situation can be > even more problematic when these methods are repeatedly invoked within an > iteration like in > [createLocatedBlockList|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1450] > ([createLocatedBlocks -> > createLocatedBlockList|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1601] > > ->[createLocatedBlock|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1487] > ->(newLocatedBlock or newLocatedStripedBlock) -> toDatanodeInfos and > toStorageIDs). Such behaviors cause significant synchronization bottlenecks > when the number of blocks or number of storage nodes is large. > [BlockPlacementPolicyDefault.getPipeline|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L1147], > > [BlockPlacementPolicyDefault.chooseTarget|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java#L287], > and > [BlockManager.validateReconstructionWork|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2355] > ( > [BlockManager.computeReconstructionWorkForBlocks|https://github.com/apache/hadoop/blob/49a495803a9451850b8982317e277b605c785587/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2187] > --> BlockManager.validateReconstructionWork --> > [incrementBlocksScheduled|https://github.com/apache/hadoop/blob/2f0dd7c4feb1e482d47786d26d6d32483f39414b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeStorageInfo.java#L338] > ) also faces a similar issue with lock contention. > Please let me know if my analysis is wrong, and if there are suggestions to > make this better. Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org