[
https://issues.apache.org/jira/browse/HDFS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251140#comment-15251140
]
Weiwei Yang commented on HDFS-6489:
-----------------------------------
[~raviprak] Thanks for looking at this.
#1 Yes, this issue can be reproduced with appending same file lots of times
(each time we close the stream), and also appending different files. The real
thing matters is you use append API quite some times in a short time window.
#2 I was proposing to wait DU thread to refresh only when on a datanode, it is
found the space is not enough for an append operation, this only happens at the
time when that wait benefits (rather than fail). And once the space usage is
updated, you would not need to wait for sometime until the problem comes up
again. I'd love to know if you have any alternative approach.
I'll upload a patch that can apply to latest trunk shortly. Thanks for looking
into this.
> DFS Used space is not correct computed on frequent append operations
> --------------------------------------------------------------------
>
> Key: HDFS-6489
> URL: https://issues.apache.org/jira/browse/HDFS-6489
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.2.0, 2.7.1, 2.7.2
> Reporter: stanley shi
> Assignee: Weiwei Yang
> Attachments: HDFS-6489.001.patch, HDFS-6489.002.patch,
> HDFS-6489.003.patch, HDFS6489.java
>
>
> The current implementation of the Datanode will increase the DFS used space
> on each block write operation. This is correct in most scenario (create new
> file), but sometimes it will behave in-correct(append small data to a large
> block).
> For example, I have a file with only one block(say, 60M). Then I try to
> append to it very frequently but each time I append only 10 bytes;
> Then on each append, dfs used will be increased with the length of the
> block(60M), not teh actual data length(10bytes).
> Consider in a scenario I use many clients to append concurrently to a large
> number of files (1000+), assume the block size is 32M (half of the default
> value), then the dfs used will be increased 1000*32M = 32G on each append to
> the files; but actually I only write 10K bytes; this will cause the datanode
> to report in-sufficient disk space on data write.
> {quote}2014-06-04 15:27:34,719 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock
> BP-1649188734-10.37.7.142-1398844098971:blk_1073742834_45306 received
> exception org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException:
> Insufficient space for appending to FinalizedReplica, blk_1073742834_45306,
> FINALIZED{quote}
> But the actual disk usage:
> {quote}
> [root@hdsh143 ~]# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda3 16G 2.9G 13G 20% /
> tmpfs 1.9G 72K 1.9G 1% /dev/shm
> /dev/sda1 97M 32M 61M 35% /boot
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)