[
https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen O'Malley updated HDFS-2053:
--------------------------------
Fix Version/s: (was: 0.20.204.0)
(was: 0.20.3)
> NameNode detects "Inconsistent diskspace" for directories with quota-enabled
> subdirectories (introduced by HDFS-1377)
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-2053
> URL: https://issues.apache.org/jira/browse/HDFS-2053
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0
> Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch
> applied.
> My impression is that the same issue exists also in the other branches where
> the HDFS-1377 patch has been applied to (see description).
> Reporter: Michael Noll
> Assignee: Michael Noll
> Priority: Minor
> Fix For: 0.20.205.0
>
> Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt
>
>
> *How to reproduce*
> {code}
> # create test directories
> $ hadoop fs -mkdir /hdfs-1377/A
> $ hadoop fs -mkdir /hdfs-1377/B
> $ hadoop fs -mkdir /hdfs-1377/C
> # ...add some test data (few kB or MB) to all three dirs...
> # set space quota for subdir C only
> $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C
> # the following two commands _on the parent dir_ trigger the warning
> $ hadoop fs -dus /hdfs-1377
> $ hadoop fs -count -q /hdfs-1377
> {code}
> Warning message in the namenode logs:
> {code}
> 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode:
> Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355
> {code}
> Note that the commands are run on the _parent directory_ but the warning is
> shown for the _subdirectory_ with space quota.
> *Background*
> The bug was introduced by the HDFS-1377 patch, which is currently committed
> to at least branch-0.20, branch-0.20-security, branch-0.20-security-204,
> branch-0.20-security-205 and release-0.20.3-rc2. In the patch,
> {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was
> updated to trigger the warning above if the cached and computed diskspace
> values are not the same for a directory with quota.
> The warning is written by {{computecontentSummary(long[] summary)}} in
> {{INodeDirectory}}. In the method an inode's children are recursively walked
> through while the {{summary}} parameter is passed and updated along the way.
> {code}
> /** {@inheritDoc} */
> long[] computeContentSummary(long[] summary) {
> if (children != null) {
> for (INode child : children) {
> child.computeContentSummary(summary);
> }
> }
> {code}
> The condition that triggers the warning message compares the current node's
> cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding
> field in {{summary}}.
> {code}
> if (-1 != node.getDsQuota() && space != summary[3]) {
> NameNode.LOG.warn("Inconsistent diskspace for directory "
> +getLocalName()+". Cached: "+space+" Computed: "+summary[3]);
> {code}
> However {{summary}} may already include diskspace information from other
> inodes at this point (i.e. from different subtrees than the subtree of the
> node for which the warning message is shown; in our example for the tree at
> {{/hdfs-1377}}, {{summary}} can already contain information from
> {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode
> {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be
> different from the computed value.
> *How to fix*
> The supplied patch creates a fresh summary array for the subtree of the
> current node. The walk through the children passes and updates this
> {{subtreeSummary}} array, and the condition is checked against
> {{subtreeSummary}} instead of the original {{summary}}. The original
> {{summary}} is updated with the values of {{subtreeSummary}} before it
> returns.
> *Unit Tests*
> I have run "ant test" on my patched build without any errors*. However the
> existing unit tests did not catch this issue for the original HDFS-1377
> patch, so this might not mean anything. ;-)
> That said I am unsure what the most appropriate way to unit test this issue
> would be. A straight-forward approach would be to automate the steps in the
> _How to reproduce section_ above and check whether the NN logs an incorrect
> warning message. But I'm not sure how this check could be implemented. Feel
> free to provide some pointers if you have some ideas.
> *Note about Fix Version/s*
> The patch _should_ apply to all branches where the HDFS-1377 patch has
> committed to. In my environment, the build was Hadoop 0.20.203.0 release
> with a (trivial) backport of HDFS-1377 (0.20.203.0 release does not ship with
> the HDFS-1377 fix). I could apply the patch successfully to
> {{branch-0.20-security}}, {{branch-0.20-security-204}} and
> {{release-0.20.3-rc2}}, for instance. Since I'm a bit confused regarding the
> upcoming 0.20.x release versions (0.20.x vs. 0.20.20x.y) I have been so bold
> and added 0.20.203.0 to the list of affected versions even though it is
> actually only affected when HDFS-1377 is applied to it...
> Best,
> Michael
> *Well, I get one error for {{TestRumenJobTraces}} but first this seems to be
> completely unrelated and second I get the same test error when running the
> tests on the stock 0.20.203.0 release build.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira