[ 
https://issues.apache.org/jira/browse/HDFS-11237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768107#comment-15768107
 ] 

Manoj Govindassamy commented on HDFS-11237:
-------------------------------------------

Hi [~bergenholtz],

{{fs -du}}, that is {{getContentSummary}} is a NameNode only operation. 
NameNode does have an account of all allocated blocks for a file. For the 
currently being-written files, NameNode doesn't have visibility on the amount 
of data being written to the last or the UNDER_CONSTRUCTION Block on various 
DataNodes. That is, NameNode most of the times lacks the byte level length 
accuracy for the files being written. NameNode eventually catches up when the 
file length is closed or when client runs hflush or hsync on the write stream. 
Not just {{du}} command, but other length query commands/APIs like like {{ls}} 
or {{getFileStatus}}  also have the similar behavior. 

> NameNode reports incorrect file size
> ------------------------------------
>
>                 Key: HDFS-11237
>                 URL: https://issues.apache.org/jira/browse/HDFS-11237
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.1
>            Reporter: Erik Bergenholtz
>
> The [hdfs] file /data/app-logs/log is continuously being written to by yarn 
> process.
> However, checking the file size through: 
> hadoop fs -du /data/app-logs/log shows incorrect file-size after a few 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to