[ 
https://issues.apache.org/jira/browse/HDFS-11237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768107#comment-15768107
 ] 

Manoj Govindassamy edited comment on HDFS-11237 at 12/21/16 8:48 PM:
---------------------------------------------------------------------

Hi [~bergenholtz],

{{fs -du}}, that is {{getContentSummary}} is a NameNode only operation. 
NameNode does have an account of all allocated blocks for a file. But, for the 
currently being-written files, NameNode doesn't have visibility on the amount 
of data being written to the last or the UNDER_CONSTRUCTION Block on various 
DataNodes. That is, NameNode most of the times lacks the byte level length 
accuracy for the files being written. NameNode eventually catches up on the 
block and file length when the file is closed or when client runs hflush or 
hsync on the write stream. Not just {{du}} command, but other length query 
commands/APIs like like {{ls}} or {{getFileStatus}}  also have the similar 
behavior. 


was (Author: manojg):
Hi [~bergenholtz],

{{fs -du}}, that is {{getContentSummary}} is a NameNode only operation. 
NameNode does have an account of all allocated blocks for a file. For the 
currently being-written files, NameNode doesn't have visibility on the amount 
of data being written to the last or the UNDER_CONSTRUCTION Block on various 
DataNodes. That is, NameNode most of the times lacks the byte level length 
accuracy for the files being written. NameNode eventually catches up when the 
file length is closed or when client runs hflush or hsync on the write stream. 
Not just {{du}} command, but other length query commands/APIs like like {{ls}} 
or {{getFileStatus}}  also have the similar behavior. 

> NameNode reports incorrect file size
> ------------------------------------
>
>                 Key: HDFS-11237
>                 URL: https://issues.apache.org/jira/browse/HDFS-11237
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.1
>            Reporter: Erik Bergenholtz
>
> The [hdfs] file /data/app-logs/log is continuously being written to by yarn 
> process.
> However, checking the file size through: 
> hadoop fs -du /data/app-logs/log shows incorrect file-size after a few 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to