[
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939084#comment-16939084
]
Fei Hui commented on HDFS-14849:
--------------------------------
+1 from me
This fix is like the function countReplicasForStripedBlock implement
{code}
/**
* For a striped block, it is possible it contains full number of internal
* blocks (i.e., 9 by default), but with duplicated replicas of the same
* internal block. E.g., for the following list of internal blocks
* b0, b0, b1, b2, b3, b4, b5, b6, b7
* we have 9 internal blocks but we actually miss b8.
* We should use this method to detect the above scenario and schedule
* necessary reconstruction.
*/
private void countReplicasForStripedBlock(NumberReplicas counters,
BlockInfoStriped block, Collection<DatanodeDescriptor> nodesCorrupt,
boolean inStartupSafeMode) {
BitSet bitSet = new BitSet(block.getTotalBlockNum());
for (StorageAndBlockIndex si : block.getStorageAndIndexInfos()) {
StoredReplicaState state = checkReplicaOnStorage(counters, block,
si.getStorage(), nodesCorrupt, inStartupSafeMode);
if (state == StoredReplicaState.LIVE) {
if (!bitSet.get(si.getBlockIndex())) {
bitSet.set(si.getBlockIndex());
} else {
counters.subtract(StoredReplicaState.LIVE, 1);
counters.add(StoredReplicaState.REDUNDANT, 1);
}
}
}
}
{code}
[~ayushtkn] Could you please take a look?
> Erasure Coding: the internal block is replicated many times when datanode is
> decommissioning
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-14849
> URL: https://issues.apache.org/jira/browse/HDFS-14849
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ec, erasure-coding
> Affects Versions: 3.3.0
> Reporter: HuangTao
> Assignee: HuangTao
> Priority: Major
> Labels: EC, HDFS, NameNode
> Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch,
> fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal
> block in that datanode will be replicated many times.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes
> simultaneously.
> !scheduleReconstruction.png!
> !fsck-file.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]