[
https://issues.apache.org/jira/browse/HDFS-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Collins updated HDFS-1377:
------------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Thanks Raghu! I've committed this to trunk, and the 22, 21, and 20 branches.
> Quota bug for partial blocks allows quotas to be violated
> ----------------------------------------------------------
>
> Key: HDFS-1377
> URL: https://issues.apache.org/jira/browse/HDFS-1377
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0, 0.23.0
> Reporter: Eli Collins
> Assignee: Eli Collins
> Priority: Blocker
> Fix For: 0.20.3, 0.21.1, 0.22.0, 0.23.0
>
> Attachments: hdfs-1377-1.patch, hdfs-1377-b20-1.patch,
> hdfs-1377-b20-2.patch, hdfs-1377-b20-3.patch, HDFS-1377.patch
>
>
> There's a bug in the quota code that causes them not to be respected when a
> file is not an exact multiple of the block size. Here's an example:
> {code}
> $ hadoop fs -mkdir /test
> $ hadoop dfsadmin -setSpaceQuota 384M /test
> $ ls dir/ | wc -l # dir contains 101 files
> 101
> $ du -ms dir # each is 3mb
> 304 dir
> $ hadoop fs -put dir /test
> $ hadoop fs -count -q /test
> none inf 402653184 -550502400 2
> 101 317718528 hdfs://haus01.sf.cloudera.com:10020/test
> $ hadoop fs -stat "%o %r" /test/dir/f30
> 134217728 3 # three 128mb blocks
> {code}
> INodeDirectoryWithQuota caches the number of bytes consumed by it's children
> in {{diskspace}}. The quota adjustment code has a bug that causes
> {{diskspace}} to get updated incorrectly when a file is not an exact multiple
> of the block size (the value ends up being negative).
> This causes the quota checking code to think that the files in the directory
> consumes less space than they actually do, so the verifyQuota does not throw
> a QuotaExceededException even when the directory is over quota. However the
> bug isn't visible to users because {{fs count -q}} reports the numbers
> generated by INode#getContentSummary which adds up the sizes of the blocks
> rather than use the cached INodeDirectoryWithQuota#diskspace value.
> In FSDirectory#addBlock the disk space consumed is set conservatively to the
> full block size * the number of replicas:
> {code}
> updateCount(inodes, inodes.length-1, 0,
> fileNode.getPreferredBlockSize()*fileNode.getReplication(), true);
> {code}
> In FSNameSystem#addStoredBlock we adjust for this conservative estimate by
> subtracting out the difference between the conservative estimate and what the
> number of bytes actually stored was:
> {code}
> //Updated space consumed if required.
> INodeFile file = (storedBlock != null) ? storedBlock.getINode() : null;
> long diff = (file == null) ? 0 :
> (file.getPreferredBlockSize() - storedBlock.getNumBytes());
> if (diff > 0 && file.isUnderConstruction() &&
> cursize < storedBlock.getNumBytes()) {
> ...
> dir.updateSpaceConsumed(path, 0, -diff*file.getReplication());
> {code}
> We do the same in FSDirectory#replaceNode when completing the file, but at a
> file granularity (I believe the intent here is to correct for the cases when
> there's a failure replicating blocks and recovery). Since oldnode is under
> construction INodeFile#diskspaceConsumed will use the preferred block size
> (vs of Block#getNumBytes used by newnode) so we will again subtract out the
> difference between the full block size and what the number of bytes actually
> stored was:
> {code}
> long dsOld = oldnode.diskspaceConsumed();
> ...
> //check if disk space needs to be updated.
> long dsNew = 0;
> if (updateDiskspace && (dsNew = newnode.diskspaceConsumed()) != dsOld) {
> try {
> updateSpaceConsumed(path, 0, dsNew-dsOld);
> ...
> {code}
> So in the above example we started with diskspace at 384mb (3 * 128mb) and
> then we subtract 375mb (to reflect only 9mb raw was actually used) twice so
> for each file the diskspace for the directory is - 366mb (384mb minus 2 *
> 375mb). Which is why the quota gets negative and yet we can still write more
> files.
> So a directory with lots of single block files (if you have multiple blocks
> on the final partial block ends up subtracting from the diskspace used) ends
> up having a quota that's way off.
> I think the fix is to in FSDirectory#replaceNode not have the
> diskspaceConsumed calculations differ when the old and new INode have the
> same blocks. I'll work on a patch which also adds a quota test for blocks
> that are not multiples of the block size and warns in
> INodeDirectory#computeContentSummary if the computed size does not reflect
> the cached value.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.