[ 
https://issues.apache.org/jira/browse/HADOOP-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524197
 ] 

Sameer Paranjpye commented on HADOOP-1700:
------------------------------------------

> As you observe, this also can happen if, e.g., a datanode is offline when a 
> file is deleted, etc. So I don't think using new blockids to implement 
> appends really makes this 
> that much more likely.

You're right, it doesn't make it much more likely. But it is a problem that 
exists in the system today and something we likely need a solution for. I don't 
know the extent of the problem, because on our installations a node that's down 
for some time usually comes back wiped clean. However, in order to solve this 
problem we need some disambiguating marker on replicas to distinguish deleted 
replicas from those currently in use. A checksum would work if it was persisted 
on the Namenode, the Namenode would only accept replicas whose checksum matched 
what it had on record. A timestamp would work as well and wouldn't need to be 
persisted, the Namenode would treat the most recent replicas as valid.

I'm suggesting that a revision number on a block that takes the form of a 
timestamp can resolve this issue as well be used to support appends.

> How would it help in the situation I described above, where only a single 
> replica is updated [ ... ]

We need the right protocol here. If we have a 2-phase protocol where a writer 
first updates the revision number on all replicas before it starts writing then 
this issue doesn't arise. If one replica updates it's revision then dies, and 
the Namenode restarts, then the Namenode wouldn't see the replica on the dead 
node and accept the remaining 2 replicas as valid (assuming a replication of 
3). In the event of a replica failing, the client would re-apply a revision 
change before writing, now using a different timestamp. The two remaining 
replicas would get updated. If the dead datanode then came back it's copy would 
be rejected. There are lots of corner cases here, but I believe they can be 
resolved such that we never silently lose data.






> Append to files in HDFS
> -----------------------
>
>                 Key: HADOOP-1700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1700
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: stack
>
> Request for being able to append to files in HDFS has been raised a couple of 
> times on the list of late.   For one example, see 
> http://www.nabble.com/HDFS%2C-appending-writes-status-tf3848237.html#a10916193.
>   Other mail describes folks' workarounds because this feature is lacking: 
> e.g. http://www.nabble.com/Loading-data-into-HDFS-tf4200003.html#a12039480 
> (Later on this thread, Jim Kellerman re-raises the HBase need of this 
> feature).  HADOOP-337 'DFS files should be appendable' makes mention of file 
> append but it was opened early in the life of HDFS when the focus was more on 
> implementing the basics rather than adding new features.  Interest fizzled.  
> Because HADOOP-337 is also a bit of a grab-bag -- it includes truncation and 
> being able to concurrently read/write -- rather than try and breathe new life 
> into HADOOP-337, instead, here is a new issue focused on file append.  
> Ultimately, being able to do as the google GFS paper describes -- having 
> multiple concurrent clients making 'Atomic Record Append' to a single file 
> would be sweet but at least for a first cut at this feature, IMO, a single 
> client appending to a single HDFS file letting the application manage the 
> access would be sufficent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to