[ 
https://issues.apache.org/jira/browse/HDFS-16476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated HDFS-16476:
--------------------------------
    Description: 
The complete process of block recovery is as follows:
1. NameNode collects which blocks need to be recovered.
2. The NameNode issues instructions to some DataNodes for execution.
3. DataNode tells NameNode after execution is complete.

Now there is no way to know how many blocks are being recovered. The number of 
metrics used to record PendingRecoveryBlocks should be increased, which is good 
for increasing the robustness of the cluster.

Here are some logs of DataNode execution:
2022-02-10 23:51:04,386 [12208592621] - INFO  [IPC Server handler 38 on 
8025:FsDatasetImpl@2687] - initReplicaRecovery: changing replica state for 
blk_xxxx from RBW to RUR
2022-02-10 23:51:04,395 [12208592630] - INFO  [IPC Server handler 47 on 
8025:FsDatasetImpl@2708] - updateReplica: BP-xxxx:blk_xxxx, 
recoveryId=18386356475, length=129869866, replica=ReplicaUnderRecovery, 
blk_xxxx, RUR

Here are some logs that NameNdoe receives after completion:
2022-02-22 10:43:58,780 [8193058814] - INFO  [IPC Server handler 15 on 
8021:FSNamesystem@3647] - commitBlockSynchronization(oldBlock=BP-xxxx, 
newgenerationstamp=18551926574, newlength=16929, newtargets=[xxxx1:1004, 
xxxx2:1004, xxxx3:1004]) successful



  was:
The complete process of block recovery is as follows:
1. NameNode collects which blocks need to be recovered.
2. The NameNode issues instructions to some DataNodes for execution.
3. DataNode tells NameNode after execution is complete.

Now there is no way to know how many blocks are being recovered. The number of 
metrics used to record PendingRecoveryBlocks should be increased, which is good 
for increasing the robustness of the cluster.


> Increase the number of metrics used to record PendingRecoveryBlocks
> -------------------------------------------------------------------
>
>                 Key: HDFS-16476
>                 URL: https://issues.apache.org/jira/browse/HDFS-16476
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: metrics, namenode
>    Affects Versions: 2.9.2, 3.4.0
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>
> The complete process of block recovery is as follows:
> 1. NameNode collects which blocks need to be recovered.
> 2. The NameNode issues instructions to some DataNodes for execution.
> 3. DataNode tells NameNode after execution is complete.
> Now there is no way to know how many blocks are being recovered. The number 
> of metrics used to record PendingRecoveryBlocks should be increased, which is 
> good for increasing the robustness of the cluster.
> Here are some logs of DataNode execution:
> 2022-02-10 23:51:04,386 [12208592621] - INFO  [IPC Server handler 38 on 
> 8025:FsDatasetImpl@2687] - initReplicaRecovery: changing replica state for 
> blk_xxxx from RBW to RUR
> 2022-02-10 23:51:04,395 [12208592630] - INFO  [IPC Server handler 47 on 
> 8025:FsDatasetImpl@2708] - updateReplica: BP-xxxx:blk_xxxx, 
> recoveryId=18386356475, length=129869866, replica=ReplicaUnderRecovery, 
> blk_xxxx, RUR
> Here are some logs that NameNdoe receives after completion:
> 2022-02-22 10:43:58,780 [8193058814] - INFO  [IPC Server handler 15 on 
> 8021:FSNamesystem@3647] - commitBlockSynchronization(oldBlock=BP-xxxx, 
> newgenerationstamp=18551926574, newlength=16929, newtargets=[xxxx1:1004, 
> xxxx2:1004, xxxx3:1004]) successful



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to