Pranav Prakash created HDFS-10529:
-------------------------------------
Summary: Df reports incorrect usage when appending less than block
size
Key: HDFS-10529
URL: https://issues.apache.org/jira/browse/HDFS-10529
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.7.2, 3.0.0-alpha1
Reporter: Pranav Prakash
Priority: Minor
Steps to recreate issue:
1. Create a 100MB file on HDFS cluster with 128MB blocksize and replication
factor 3
2. Append 100MB to the file
3. Df reports around 900MB even though it should only be around 600MB.
Looking at the blocks confirms that df is incorrect, as there exist only two
blocks on each DN -- a 128MB block and a 72MB block.
This issue seems to arise because BlockPoolSlice does not account for the delta
increase in dfsUsage when an append happens to a partially-filled block, and
instead naively adds the total block size. For instance, in the example
scenario when when block is "filled" from 100 to 128MB, addFinalizedBlock() in
BlockPoolSlice adds the size of the newly created block into the total instead
of accounting for the difference/delta in block size between old and new. This
has the effect of double-counting the old partially-filled block: it is counted
once when it is first created (in the example scenario when the 100MB file is
created) and again when it becomes part of the filled block (in the example
scenario when the 128MB block is formed form the initial 100MB block). Thus the
perceived size becomes 100MB + 128MB + 72 = 300 MB for each DN, or 900MB across
the cluster.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]