[ https://issues.apache.org/jira/browse/HDFS-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Collins resolved HDFS-2053. ------------------------------- Resolution: Fixed I've merged this to branch-0.20-security. > Bug in INodeDirectory#computeContentSummary warning > --------------------------------------------------- > > Key: HDFS-2053 > URL: https://issues.apache.org/jira/browse/HDFS-2053 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20.3, 0.20.204.0, 0.20.205.0 > Environment: Hadoop release 0.20.203.0 with the HDFS-1377 patch > applied. > My impression is that the same issue exists also in the other branches where > the HDFS-1377 patch has been applied to (see description). > Reporter: Michael Noll > Assignee: Michael Noll > Priority: Minor > Fix For: 0.20.205.0 > > Attachments: HDFS-2053_v1.txt, HDFS-2053_v2.txt, HDFS-2053_v3.txt, > hdfs-2053_v3-b20.patch > > > *How to reproduce* > {code} > # create test directories > $ hadoop fs -mkdir /hdfs-1377/A > $ hadoop fs -mkdir /hdfs-1377/B > $ hadoop fs -mkdir /hdfs-1377/C > # ...add some test data (few kB or MB) to all three dirs... > # set space quota for subdir C only > $ hadoop dfsadmin -setSpaceQuota 1g /hdfs-1377/C > # the following two commands _on the parent dir_ trigger the warning > $ hadoop fs -dus /hdfs-1377 > $ hadoop fs -count -q /hdfs-1377 > {code} > Warning message in the namenode logs: > {code} > 2011-06-09 09:42:39,817 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: > Inconsistent diskspace for directory C. Cached: 433872320 Computed: 438465355 > {code} > Note that the commands are run on the _parent directory_ but the warning is > shown for the _subdirectory_ with space quota. > *Background* > The bug was introduced by the HDFS-1377 patch, which is currently committed > to at least branch-0.20, branch-0.20-security, branch-0.20-security-204, > branch-0.20-security-205 and release-0.20.3-rc2. In the patch, > {{src/hdfs/org/apache/hadoop/hdfs/server/namenode/INodeDirectory.java}} was > updated to trigger the warning above if the cached and computed diskspace > values are not the same for a directory with quota. > The warning is written by {{computecontentSummary(long[] summary)}} in > {{INodeDirectory}}. In the method an inode's children are recursively walked > through while the {{summary}} parameter is passed and updated along the way. > {code} > /** {@inheritDoc} */ > long[] computeContentSummary(long[] summary) { > if (children != null) { > for (INode child : children) { > child.computeContentSummary(summary); > } > } > {code} > The condition that triggers the warning message compares the current node's > cached diskspace (via {{node.diskspaceConsumed()}}) with the corresponding > field in {{summary}}. > {code} > if (-1 != node.getDsQuota() && space != summary[3]) { > NameNode.LOG.warn("Inconsistent diskspace for directory " > +getLocalName()+". Cached: "+space+" Computed: "+summary[3]); > {code} > However {{summary}} may already include diskspace information from other > inodes at this point (i.e. from different subtrees than the subtree of the > node for which the warning message is shown; in our example for the tree at > {{/hdfs-1377}}, {{summary}} can already contain information from > {{/hdfs-1377/A}} and {{/hdfs-1377/B}} when it is passed to inode > {{/hdfs-1377/C}}). Hence the cached value for {{C}} can incorrectly be > different from the computed value. > *How to fix* > The supplied patch creates a fresh summary array for the subtree of the > current node. The walk through the children passes and updates this > {{subtreeSummary}} array, and the condition is checked against > {{subtreeSummary}} instead of the original {{summary}}. The original > {{summary}} is updated with the values of {{subtreeSummary}} before it > returns. > *Unit Tests* > I have run "ant test" on my patched build without any errors*. However the > existing unit tests did not catch this issue for the original HDFS-1377 > patch, so this might not mean anything. ;-) > That said I am unsure what the most appropriate way to unit test this issue > would be. A straight-forward approach would be to automate the steps in the > _How to reproduce section_ above and check whether the NN logs an incorrect > warning message. But I'm not sure how this check could be implemented. Feel > free to provide some pointers if you have some ideas. > *Note about Fix Version/s* > The patch _should_ apply to all branches where the HDFS-1377 patch has > committed to. In my environment, the build was Hadoop 0.20.203.0 release > with a (trivial) backport of HDFS-1377 (0.20.203.0 release does not ship with > the HDFS-1377 fix). I could apply the patch successfully to > {{branch-0.20-security}}, {{branch-0.20-security-204}} and > {{release-0.20.3-rc2}}, for instance. Since I'm a bit confused regarding the > upcoming 0.20.x release versions (0.20.x vs. 0.20.20x.y) I have been so bold > and added 0.20.203.0 to the list of affected versions even though it is > actually only affected when HDFS-1377 is applied to it... > Best, > Michael > *Well, I get one error for {{TestRumenJobTraces}} but first this seems to be > completely unrelated and second I get the same test error when running the > tests on the stock 0.20.203.0 release build. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira