[
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260954#comment-15260954
]
Ravi Prakash commented on HDFS-6489:
------------------------------------
The problem is here:
https://github.com/apache/hadoop/blob/f16722d2ef31338a57a13e2c8d18c1c62d58bbaf/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java#L323
. Even though this is an append, {{dfsUsage}} is incremented by the total
block size every time. This can be easily seen by running the
{{testFrequentAppend}} (included in Weiwei's patch) and adding a log line after
line 323.
As far as I can see, this problem existed since 2012, but only recently did
this become problematic because we started considering dfsUsed space in
deciding whether to write a block or not.
> DFS Used space is not correct computed on frequent append operations
> --------------------------------------------------------------------
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.2.0, 2.7.1, 2.7.2
> Reporter: stanley shi
> Assignee: Weiwei Yang
> Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch,
> HDFS-6489.003.patch, HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space
> on each block write operation. This is correct in most scenario (create new
> file), but sometimes it will behave in-correct(append small data to a large
> block).
> For example, I have a file with only one block(say, 60M). Then I try to
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large
> number of files (1000+), assume the block size is 32M (half of the default
> value), then the dfs used will be increased 1000*32M = 32G on each append to
> the files; but actually I only write 10K bytes; this will cause the datanode
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException:
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306,
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda3 16G 2.9G 13G 20% /
> tmpfs 1.9G 72K 1.9G 1% /dev/shm
> /dev/sda1 97M 32M 61M 35% /boot
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)