[
https://issues.apache.org/jira/browse/HDFS-17054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735220#comment-17735220
]
ASF GitHub Bot commented on HDFS-17054:
---------------------------------------
zhangshuyan0 commented on code in PR #5760:
URL: https://github.com/apache/hadoop/pull/5760#discussion_r1235035757
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:
##########
@@ -4544,25 +4546,32 @@ public NumberReplicas countNodes(BlockInfo b) {
NumberReplicas countNodes(BlockInfo b, boolean inStartupSafeMode) {
NumberReplicas numberReplicas = new NumberReplicas();
Collection<DatanodeDescriptor> nodesCorrupt = corruptReplicas.getNodes(b);
+ HashSet<DatanodeDescriptor> haveComputedAsCorrupted = null;
if (b.isStriped()) {
+ haveComputedAsCorrupted = new HashSet<>();
countReplicasForStripedBlock(numberReplicas, (BlockInfoStriped) b,
- nodesCorrupt, inStartupSafeMode);
+ nodesCorrupt, inStartupSafeMode, haveComputedAsCorrupted);
} else {
for (DatanodeStorageInfo storage : blocksMap.getStorages(b)) {
checkReplicaOnStorage(numberReplicas, b, storage, nodesCorrupt,
- inStartupSafeMode);
+ inStartupSafeMode, haveComputedAsCorrupted);
}
}
return numberReplicas;
}
private StoredReplicaState checkReplicaOnStorage(NumberReplicas counters,
BlockInfo b, DatanodeStorageInfo storage,
- Collection<DatanodeDescriptor> nodesCorrupt, boolean inStartupSafeMode) {
+ Collection<DatanodeDescriptor> nodesCorrupt, boolean inStartupSafeMode,
+ HashSet<DatanodeDescriptor> haveComputedAsCorrupted) {
final StoredReplicaState s;
if (storage.getState() == State.NORMAL) {
final DatanodeDescriptor node = storage.getDatanodeDescriptor();
- if (nodesCorrupt != null && nodesCorrupt.contains(node)) {
+ if (nodesCorrupt != null && nodesCorrupt.contains(node) &&
+ (haveComputedAsCorrupted == null ||
!haveComputedAsCorrupted.contains(node))) {
+ if (haveComputedAsCorrupted != null) {
+ haveComputedAsCorrupted.add(node);
Review Comment:
If I understand your code correctly, if the same block group has two
internal blocks on the same datanode, then you will only calculate one. IMO,
the current implementation of `CorruptReplicasMap` does not record which
specific internal block on the datanode was corrupt, how could you confirm that
there is only one internal block corrupt?
> Erasure coding: optimize checkReplicaOnStorage method to avoid regarding all
> replicas on one datanode as corrupt repeatly.
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-17054
> URL: https://issues.apache.org/jira/browse/HDFS-17054
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: farmmamba
> Priority: Critical
> Labels: pull-request-available
>
> Currently, when we execute countNodes method for a striped block, if a
> datanode has two ec blocks with different block id (because of some special
> cases and I will find and fix it laterly.). One block is LIVE, the other is
> CORRUPT. Current logic will compute two CORRUPT replicas because
> corruptReplicas contains the datanode. We should fix it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]