[
https://issues.apache.org/jira/browse/HDFS-17048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732506#comment-17732506
]
liuguanghua commented on HDFS-17048:
------------------------------------
[~hexiaoqiao] , I know the reason because my datanode config
fs.getspaceused.classname is
DFCachingGetSpaceUsed, not DU. So datanode report namenode his dfsused is
large than the actual size . Thanks you very much.
> FSNamesystem.delete() maybe cause data residue when active namenode crash or
> shutdown
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-17048
> URL: https://issues.apache.org/jira/browse/HDFS-17048
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Environment: hdfs3.3
> Reporter: liuguanghua
> Priority: Major
>
> Consider the following scenario:
> (1) User delete a hdfs dir with many blocks.
> (2) Then ative Namenode is crash or shutdown or failover to standby Namenode
> by administrator
> (3) This may result in residual data
>
> FSNamesystem.delete() will
> (1)delete dir first
> (2)add toRemovedBlocks into markedDeleteQueue.
> (3) MarkedDeleteBlockScrubber Thread will consumer the markedDeleteQueue and
> delete blocks.
> If the active namenode crash, the blocks in markedDeleteQueue will be lost
> and never be deleted. And the block cloud not find via hdfs fsck command. But
> it is alive in datanode disk.
>
> Thus ,
> SummaryA = hdfs dfs -du -s /
> SummaryB =sum( datanode report dfsused)
> SummaryA < SummaryB
>
> This may be unavoidable. But is there any way to find out the blocks that
> should be deleted and clean it ?
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]