[
https://issues.apache.org/jira/browse/HDFS-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878892#comment-15878892
]
Ravi Prakash commented on HDFS-11435:
-------------------------------------
Thanks for filing the JIRA Manoj! It sure is an interesting idea.
Is there any purpose to updating the file length more frequently? Could you
please give a use case if you have one in mind?
I probably lack the imagination, but I'm curious to know what use case is
solved by having the length updated every heartbeat and not every block
allocation / hsync / close? In both cases, clients will have to trust whatever
the datanodes which have the block say. Perhaps this is just to check the
progress of a very slow writer? Is the overhead imposed on the Namenode worth
this additional benefit? I am surely not averse to having the option of
checking the length deeply, but I wonder what the overhead will be on large
clusters.
> NameNode should track open for write files lengths more frequent than on
> newer block allocations
> ------------------------------------------------------------------------------------------------
>
> Key: HDFS-11435
> URL: https://issues.apache.org/jira/browse/HDFS-11435
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
>
> *Problem:*
> Currently the length of an open for write / Under construction file is
> updated on the NameNode only when
> # Block boundary: On block boundaries and upon allocation of new Block,
> NameNode gets to know the file growth and the file length catches up
> # hsync(SyncFlag.UPDATE_LENGTH): Upon Client apps invoking a hsync on the
> write stream with a special flag, DataNodes send an incremental block report
> with the latest file length which NameNode uses it to update its meta data.
> # First hflush() on the new Block: Upon Client apps doing first time hflush()
> on an every new Block, DataNodes notifies NameNode about the latest file
> length.
> # Output stream close: Forces DataNodes update NameNode about the file length
> after data persistence and proper acknowledgements in the pipeline.
> So, lengths for open for write files are usually a lot less than the length
> seen by the DN/client. Highly preferred to have NameNode not lagging in file
> lengths by order of Block size for under construction files and to have more
> frequent, scalable update mechanism for these open file lengths.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]