[
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370995#comment-17370995
]
Daniel Ma edited comment on HDFS-15796 at 6/29/21, 2:36 AM:
------------------------------------------------------------
[~weichiu] No idea what kind of condition can reproduce this problem. it seems
the targets object is modified elsewhere when computeReconstrutionWorkForBlocks
is in progress owing to unsafe thread issue.
{code:java}
//代码占位符
// Step 2: choose target nodes for each reconstruction task
for (BlockReconstructionWork rw : reconWork) {
// Exclude all of the containing nodes from being targets.
// This list includes decommissioning or corrupt nodes.
final Set<Node> excludedNodes = new HashSet<>(rw.getContainingNodes());
List<DatanodeStorageInfo> targets = pendingReconstruction
.getTargets(rw.getBlock());
if (targets != null) {
for (DatanodeStorageInfo dn : targets) {
if (!excludedNodes.contains(dn.getDatanodeDescriptor())) {
excludedNodes.add(dn.getDatanodeDescriptor());
}
}
}
// choose replication targets: NOT HOLDING THE GLOBAL LOCK
final BlockPlacementPolicy placementPolicy =
placementPolicies.getPolicy(rw.getBlock().getBlockType());
rw.chooseTargets(placementPolicy, storagePolicySuite, excludedNodes);
}
{code}
was (Author: daniel ma):
[~weichiu] No idea what kind of condition can reproduce this problem. it seems
the tergets object is modified elsewhere, when
computeReconstrutionWorkForBlocks is in progress.
{code:java}
//代码占位符
// Step 2: choose target nodes for each reconstruction task
for (BlockReconstructionWork rw : reconWork) {
// Exclude all of the containing nodes from being targets.
// This list includes decommissioning or corrupt nodes.
final Set<Node> excludedNodes = new HashSet<>(rw.getContainingNodes());
List<DatanodeStorageInfo> targets = pendingReconstruction
.getTargets(rw.getBlock());
if (targets != null) {
for (DatanodeStorageInfo dn : targets) {
if (!excludedNodes.contains(dn.getDatanodeDescriptor())) {
excludedNodes.add(dn.getDatanodeDescriptor());
}
}
}
// choose replication targets: NOT HOLDING THE GLOBAL LOCK
final BlockPlacementPolicy placementPolicy =
placementPolicies.getPolicy(rw.getBlock().getBlockType());
rw.chooseTargets(placementPolicy, storagePolicySuite, excludedNodes);
}
{code}
> ConcurrentModificationException error happens on NameNode occasionally
> ----------------------------------------------------------------------
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 3.1.1
> Reporter: Daniel Ma
> Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor
> thread received Runtime exception. | BlockManager.java:4746
> java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
> at java.util.ArrayList$Itr.next(ArrayList.java:859)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
> at
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
> at java.lang.Thread.run(Thread.java:748)
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]