On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins <e...@cloudera.com> wrote:

>
>
> Append introduces non-trivial design and code complexity, which is not
> worth the cost if we don't have real users.

The bulk of the complexity of HDFS-265 ("the new Append") was around
Hflush, concurrent readers, the pipeline etc. The code and complexity  for
appending to previously closed file was not that large.



> Removing append means we
> have the property that HDFS blocks, when finalized, are immutable.
> This significantly simplifies the design and code, which significantly
> simplifies the implementation of other features like snapshots,
> HDFS-level caching, dedupe, etc.
>

While Snapshots  are challenging with Append, it is solvable - the snapshot
needs to remember the length of the file. (We have a working prototype - we
will posting the design and the code soon).


I agree that the notion of an immutable file is useful since it lets the
system and tools optimize certain things.  A xerox-parc file system in the
80s had this feature that the system exploited. I would support adding the
notion of an immutable file to Hadoop.


sanjay

Reply via email to