[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning

Fei Hui (Jira) Thu, 26 Sep 2019 20:08:25 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939084#comment-16939084
 ]


Fei Hui commented on HDFS-14849:
--------------------------------

+1 from me
This fix is like the function countReplicasForStripedBlock implement
{code}
  /**
   * For a striped block, it is possible it contains full number of internal
   * blocks (i.e., 9 by default), but with duplicated replicas of the same
   * internal block. E.g., for the following list of internal blocks
   * b0, b0, b1, b2, b3, b4, b5, b6, b7
   * we have 9 internal blocks but we actually miss b8.
   * We should use this method to detect the above scenario and schedule
   * necessary reconstruction.
   */
  private void countReplicasForStripedBlock(NumberReplicas counters,
      BlockInfoStriped block, Collection<DatanodeDescriptor> nodesCorrupt,
      boolean inStartupSafeMode) {
    BitSet bitSet = new BitSet(block.getTotalBlockNum());
    for (StorageAndBlockIndex si : block.getStorageAndIndexInfos()) {
      StoredReplicaState state = checkReplicaOnStorage(counters, block,
          si.getStorage(), nodesCorrupt, inStartupSafeMode);
      if (state == StoredReplicaState.LIVE) {
        if (!bitSet.get(si.getBlockIndex())) {
          bitSet.set(si.getBlockIndex());
        } else {
          counters.subtract(StoredReplicaState.LIVE, 1);
          counters.add(StoredReplicaState.REDUNDANT, 1);
        }
      }
    }
  }
{code}
[~ayushtkn] Could you please take a look?

> Erasure Coding: the internal block is replicated many times when datanode is 
> decommissioning
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14849
>                 URL: https://issues.apache.org/jira/browse/HDFS-14849
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec, erasure-coding
>    Affects Versions: 3.3.0
>            Reporter: HuangTao
>            Assignee: HuangTao
>            Priority: Major
>              Labels: EC, HDFS, NameNode
>         Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, 
> fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal 
> block in that datanode will be replicated many times.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
> simultaneously. 
>  !scheduleReconstruction.png! 
>  !fsck-file.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning

Reply via email to