[
https://issues.apache.org/jira/browse/HDFS-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284351#comment-14284351
]
Byron Wong commented on HDFS-7611:
----------------------------------
This bug happens only in tests with restarts and happens because blocks from
files created in previous tests are not being deleted when replaying edits logs.
1) I'm still investigating the source of this, but some time while replaying
edits, {{DirectoryWithSnapshotFeature$cleanDirectory}} can decrement an INode's
namespace quota to negative. Either the namespace count was overcounting while
cleaning directories or snapshotDiff, or the INode's namespace quota wasn't
counted up properly in the first place.
2) If the INode's namespace quota happens to be -1, the blocks associated with
that inode will not be deleted. When we call {{fsd.removeLastINode(iip)}} in
{{FSDirDeleteOp$unprotectedDelete}}, we explicitly check whether its return
code is -1. In that case, we skip collecting the blocks that should be deleted.
Notice that in {{FSDirectory$removeLastINode}}, one of the possible returns is
{{return counts.get(Quota.NAMESPACE)}}.
3) Now there are blocks in the blocksMap that shouldn't be there. This will
increase the number of blocks needed to get out of safeMode. The test failure
depends on whether the namenode receives these blocks. If it does, then the
namenode will exit safeMode and the test will suceed.
> TestFileTruncate.testTruncateEditLogLoad times out waiting for Mini HDFS
> Cluster to start
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-7611
> URL: https://issues.apache.org/jira/browse/HDFS-7611
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Byron Wong
> Attachments: testTruncateEditLogLoad.log
>
>
> I've seen it failing on Jenkins a couple of times. Somehow the cluster is not
> comming ready after NN restart.
> Not sure if it is truncate specific, as I've seen same behaviour with other
> tests that restart the NameNode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)