[ 
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524144
 ] 

Doug Cutting commented on HADOOP-1700:
--------------------------------------

> If the Namenode then assigned this <oldid> to a new file, there would be a 
> block-id collision.

As you observe, this also can happen if, e.g., a datanode is offline when a 
file is deleted, etc.  So I don't think using new blockids to implement appends 
really makes this that much more likely.

The primary advantage you've presented that versioning blocks has over 
allocating new blockids is that version numbers might not have to be persisted. 
 But versions come at a cost: they take more memory; they have some worrisome 
edge conditions; and they introduce new concepts into a complex system, rather 
than simply building on existing, debugged concepts.  None of these are fatal, 
but I don't yet see versioning as the clearly winning strategy.

> If we use a revision number that takes the form of a timestamp, it can be 
> used to distinguish not only out of date replicas of a currently existing 
> file but also those from old long deleted files.

I don't follow how this would work.  Can you explain more?  How would it help 
in the situation I described above, where only a single replica is updated, its 
datanode dies, and the namenode is restarted.  The file should be corrupt, 
since the only up-to-date replica of one of its blocks is missing.  How would 
you detect that?  Would you declare any file corrupt whose last-modified time 
was later than that of any of its blocks?  That seems fragile.

> Append to files in HDFS
> -----------------------
>
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of 
> times on the list of late.   For one example, see 
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
>   Other mail describes folks' workarounds because this feature is lacking: 
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 
> (Later on this thread, Jim Kellerman re-raises the HBase need of this 
> feature).  HADOOP-337 'DFS files should be appendable' makes mention of file 
> append but it was opened early in the life of HDFS when the focus was more on 
> implementing the basics rather than adding new features.  Interest fizzled.  
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and 
> being able to concurrently read/write -- rather than try and breathe new life 
> into HADOOP-337, instead, here is a new issue focused on file append.  
> Ultimately, being able to do as the google GFS paper describes -- having 
> multiple concurrent clients making 'Atomic Record Append' to a single file 
> would be sweet but at least for a first cut at this feature, IMO, a single 
> client appending to a single HDFS file letting the application manage the 
> access would be sufficent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to