On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <e...@cloudera.com> wrote:
> > > Append introduces non-trivial design and code complexity, which is not > worth the cost if we don't have real users. The bulk of the complexity of HDFS-265 ("the new Append") was around Hflush, concurrent readers, the pipeline etc. The code and complexity for appending to previously closed file was not that large. > Removing append means we > have the property that HDFS blocks, when finalized, are immutable. > This significantly simplifies the design and code, which significantly > simplifies the implementation of other features like snapshots, > HDFS-level caching, dedupe, etc. > While Snapshots are challenging with Append, it is solvable - the snapshot needs to remember the length of the file. (We have a working prototype - we will posting the design and the code soon). I agree that the notion of an immutable file is useful since it lets the system and tools optimize certain things. A xerox-parc file system in the 80s had this feature that the system exploited. I would support adding the notion of an immutable file to Hadoop. sanjay