[
https://issues.apache.org/jira/browse/HDFS-16593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598797#comment-17598797
]
ASF GitHub Bot commented on HDFS-16593:
---------------------------------------
Hexiaoqiao commented on code in PR #4353:
URL: https://github.com/apache/hadoop/pull/4353#discussion_r960344532
##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##########
@@ -2350,6 +2350,9 @@ private void invalidate(String bpid, Block[] invalidBlks,
boolean async)
removing = volumeMap.remove(bpid, invalidBlks[i]);
addDeletingBlock(bpid, removing.getBlockId());
LOG.debug("Block file {} is to be deleted", removing.getBlockURI());
+ if (datanode.getMetrics() != null) {
Review Comment:
Ah, I believe that this will make `BlocksRemoved` more precisions. But I do
not think it will be exact, consider that `removing == null` which means that
DataNode do not manage this pending delete replica, so we should not increase
the metric. Maybe there are some other cases, not thought deeply. I just
suggest we should consider if removing will be null.
Another point, line 2352 will throw NPE but not decide if removing will null
or not. This is another issue, not related to this PR.
> Correct inaccurate BlocksRemoved metric on DataNode side
> --------------------------------------------------------
>
> Key: HDFS-16593
> URL: https://issues.apache.org/jira/browse/HDFS-16593
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When tracing the root cause of production issue, I found that the
> BlocksRemoved metric on Datanode size was inaccurate.
> {code:java}
> case DatanodeProtocol.DNA_INVALIDATE:
> //
> // Some local block(s) are obsolete and can be
> // safely garbage-collected.
> //
> Block toDelete[] = bcmd.getBlocks();
> try {
> // using global fsdataset
> dn.getFSDataset().invalidate(bcmd.getBlockPoolId(), toDelete);
> } catch(IOException e) {
> // Exceptions caught here are not expected to be disk-related.
> throw e;
> }
> dn.metrics.incrBlocksRemoved(toDelete.length);
> break;
> {code}
> Because even if the invalidate method throws an exception, some blocks may
> have been successfully deleted internally.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]