[jira] Commented: (HADOOP-1700) Append to files in HDFS

Doug Cutting (JIRA) Thu, 30 Aug 2007 20:48:59 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523991
 ]


Doug Cutting commented on HADOOP-1700:
--------------------------------------

> A revision number update can simply be recorded in memory.

So the namenode wouldn't persist the block version number, it would just keep 
track of the highest revision that's yet been reported to it?  If for any 
reason all of the replicas are not updated to the latest revision, the namenode 
would still need to issue replication and deletion commands, right?  But I 
think you're arguing that in what's hopefully the common case, when all 
replicas are sucessfully modified, the namenode need only increment its 
in-memory block revision number and would not need to issue any other commands 
nor persist any data.  That would indeed be a namenode performance advantage to 
this approach.

One cost of this approach is that block revision numbers would consume more 
memory per block on the namenode, something we wish to minimize.  So there is 
something of a time/space tradeoff.

Some edge semantics would be different too.  If only one replica is updated, 
and its datanode dies, and the namenode is restarted, then the change would 
silently be lost, no?  With new block ids, this would result in a file with a 
missing block that could not be read.

> Append to files in HDFS
> -----------------------
>
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of 
> times on the list of late.   For one example, see 
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
>   Other mail describes folks' workarounds because this feature is lacking: 
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 
> (Later on this thread, Jim Kellerman re-raises the HBase need of this 
> feature).  HADOOP-337 'DFS files should be appendable' makes mention of file 
> append but it was opened early in the life of HDFS when the focus was more on 
> implementing the basics rather than adding new features.  Interest fizzled.  
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and 
> being able to concurrently read/write -- rather than try and breathe new life 
> into HADOOP-337, instead, here is a new issue focused on file append.  
> Ultimately, being able to do as the google GFS paper describes -- having 
> multiple concurrent clients making 'Atomic Record Append' to a single file 
> would be sweet but at least for a first cut at this feature, IMO, a single 
> client appending to a single HDFS file letting the application manage the 
> access would be sufficent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1700) Append to files in HDFS

Reply via email to