[jira] [Commented] (HDFS-10529) Df reports incorrect usage when appending less than block size

Lei (Eddy) Xu (JIRA) Wed, 15 Jun 2016 15:16:47 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332719#comment-15332719
 ]


Lei (Eddy) Xu commented on HDFS-10529:
--------------------------------------

Hi, [~pranavprakash] Thanks a lot for the fixes.

Would you mind to add a unit test to verify the problem existed before and 
fixed after your patch ?
THanks!

> Df reports incorrect usage when appending less than block size
> --------------------------------------------------------------
>
>                 Key: HDFS-10529
>                 URL: https://issues.apache.org/jira/browse/HDFS-10529
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.2, 3.0.0-alpha1
>            Reporter: Pranav Prakash
>            Assignee: Pranav Prakash
>            Priority: Minor
>              Labels: datanode, fs, hdfs
>         Attachments: HDFS-10529.000.patch
>
>
> Steps to recreate issue:
> 1. Create a 100MB file on HDFS cluster with 128MB blocksize and replication 
> factor 3
> 2. Append 100MB to the file
> 3. Df reports around 900MB even though it should only be around 600MB.
> Looking at the blocks confirms that df is incorrect, as there exist only two 
> blocks on each DN -- a 128MB block and a 72MB block.
> This issue seems to arise because BlockPoolSlice does not account for the 
> delta increase in dfsUsage when an append happens to a partially-filled 
> block, and instead naively adds the total block size. For instance, in the 
> example scenario when when block is "filled" from 100 to 128MB, 
> addFinalizedBlock() in BlockPoolSlice adds the size of the newly created 
> block into the total instead of accounting for the difference/delta in block 
> size between old and new.  This has the effect of double-counting the old 
> partially-filled block: it is counted once when it is first created (in the 
> example scenario when the 100MB file is created) and again when it becomes 
> part of the filled block (in the example scenario when the 128MB block is 
> formed form the initial 100MB block). Thus the perceived size becomes 100MB + 
> 128MB + 72 = 300 MB for each DN, or 900MB across the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10529) Df reports incorrect usage when appending less than block size

Reply via email to