[ 
https://issues.apache.org/jira/browse/HDFS-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284351#comment-14284351
 ] 

Byron Wong commented on HDFS-7611:
----------------------------------

This bug happens only in tests with restarts and happens because blocks from 
files created in previous tests are not being deleted when replaying edits logs.
1) I'm still investigating the source of this, but some time while replaying 
edits, {{DirectoryWithSnapshotFeature$cleanDirectory}} can decrement an INode's 
namespace quota to negative. Either the namespace count was overcounting while 
cleaning directories or snapshotDiff, or the INode's namespace quota wasn't 
counted up properly in the first place.
2) If the INode's namespace quota happens to be -1, the blocks associated with 
that inode will not be deleted. When we call {{fsd.removeLastINode(iip)}} in 
{{FSDirDeleteOp$unprotectedDelete}}, we explicitly check whether its return 
code is -1. In that case, we skip collecting the blocks that should be deleted. 
Notice that in {{FSDirectory$removeLastINode}}, one of the possible returns is 
{{return counts.get(Quota.NAMESPACE)}}.
3) Now there are blocks in the blocksMap that shouldn't be there. This will 
increase the number of blocks needed to get out of safeMode. The test failure 
depends on whether the namenode receives these blocks. If it does, then the 
namenode will exit safeMode and the test will suceed.

> TestFileTruncate.testTruncateEditLogLoad times out waiting for Mini HDFS 
> Cluster to start
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-7611
>                 URL: https://issues.apache.org/jira/browse/HDFS-7611
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Konstantin Shvachko
>            Assignee: Byron Wong
>         Attachments: testTruncateEditLogLoad.log
>
>
> I've seen it failing on Jenkins a couple of times. Somehow the cluster is not 
> comming ready after NN restart.
> Not sure if it is truncate specific, as I've seen same behaviour with other 
> tests that restart the NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to