[ 
https://issues.apache.org/jira/browse/HDFS-16593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598797#comment-17598797
 ] 

ASF GitHub Bot commented on HDFS-16593:
---------------------------------------

Hexiaoqiao commented on code in PR #4353:
URL: https://github.com/apache/hadoop/pull/4353#discussion_r960344532


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##########
@@ -2350,6 +2350,9 @@ private void invalidate(String bpid, Block[] invalidBlks, 
boolean async)
         removing = volumeMap.remove(bpid, invalidBlks[i]);
         addDeletingBlock(bpid, removing.getBlockId());
         LOG.debug("Block file {} is to be deleted", removing.getBlockURI());
+        if (datanode.getMetrics() != null) {

Review Comment:
   Ah, I believe that this will make `BlocksRemoved` more precisions. But I do 
not think it will be exact, consider that `removing == null` which means that 
DataNode do not manage this pending delete replica, so we should not increase 
the metric. Maybe there are some other cases, not thought deeply. I just 
suggest we should consider if removing will be null.
   Another point, line 2352 will throw NPE but not decide if removing will null 
or not. This is another issue, not related to this PR.





> Correct inaccurate BlocksRemoved metric on DataNode side
> --------------------------------------------------------
>
>                 Key: HDFS-16593
>                 URL: https://issues.apache.org/jira/browse/HDFS-16593
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> When tracing the root cause of production issue, I found that the 
> BlocksRemoved  metric on Datanode size was inaccurate.
> {code:java}
> case DatanodeProtocol.DNA_INVALIDATE:
>       //
>       // Some local block(s) are obsolete and can be 
>       // safely garbage-collected.
>       //
>       Block toDelete[] = bcmd.getBlocks();
>       try {
>         // using global fsdataset
>         dn.getFSDataset().invalidate(bcmd.getBlockPoolId(), toDelete);
>       } catch(IOException e) {
>         // Exceptions caught here are not expected to be disk-related.
>         throw e;
>       }
>       dn.metrics.incrBlocksRemoved(toDelete.length);
>       break;
> {code}
> Because even if the invalidate method throws an exception, some blocks may 
> have been successfully deleted internally.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to