Manoj Govindassamy created HDFS-11435:
-----------------------------------------

             Summary: NameNode should track open for write files lengths more 
frequent than on newer block allocations
                 Key: HDFS-11435
                 URL: https://issues.apache.org/jira/browse/HDFS-11435
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy


*Problem:*
Currently the length of an open for write / Under construction file is updated 
on the NameNode only when 

# Block boundary: On block boundaries and upon allocation of new Block, 
NameNode gets to know the file growth and the file length catches up
# hsync(SyncFlag.UPDATE_LENGTH): Upon Client apps invoking a hsync on the write 
stream with a special flag, DataNodes send an incremental block report with the 
latest file length which NameNode uses it to update its meta data.
# First hflush() on the new Block: Upon Client apps doing first time hflush() 
on an every new Block, DataNodes notifies NameNode about the latest file length.
# Output stream close: Forces DataNodes update NameNode about the file length 
after data persistence and proper acknowledgements in the pipeline.

So, lengths for open for write files are usually a lot less than the length 
seen by the DN/client. Highly preferred to have NameNode not lagging in file 
lengths by order of Block size for under construction files and to have more 
frequent, scalable update mechanism for these open file lengths. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to