On Mar 20, 2012, at 7:37 PM, Eli Collins wrote:
> Hey gang,
> 
> I'd like to get people's thoughts on the following proposal. I think
> we should consider removing append from HDFS.
> 
> Where we are today.. append was added in the 0.17-19 releases
> (HADOOP-1700) and subsequently disabled (HADOOP-5224) due to quality
> issues. It and sync were re-designed, re-implemented, and shipped in
> 21.0 (HDFS-265). To my knowledge, there has been no real production
> use. Anecdotally people who worked on branch-20-append have told me
> they think the new trunk code is substantially less well-tested than
> the branch-20-append code (at least for sync, append was never well
> tested). It has certainly gotten way less pounding from HBase users.
> The design however, is much improved, and people think we can get
> hsync (and append) stabilized in trunk (mostly testing and bug
> fixing).

Up front:  I think append is a needed feature.

Politely speaking, I think the premise of the question is a bit dubious due to 
circular nature.  Ie. It's not used in production so is it worth it?  The 
stigma/perception that append has been unstable and is not well-tested is a 
compelling reason to not be in production at major installations.  The 
situation is going to be akin to "You go first. No, you go first!  No way, you 
go first!".

Downstream projects also aren't going to use something until it's stable, so 
they either work around the limitation, or...  they chose something other hdfs. 
 There's also the unanswerable question of how potential users have been 
silently lost.  We are unlikely to have heard the user demand from those that 
chose another solution.  Generally for every complaint/request, a large N-many 
people didn't even bother.

I envision a day where hdfs is a performant posix filesystem.  Dropping append 
sets us back from that goal.  Admittedly, I don't know all the intricacies of 
how append was implemented and why it is/was difficult.  Is the complexity 
maybe due to "bolting" append onto code that wasn't designed with mutability in 
mind?  (That's truly a question, not a statement) If so, perhaps a refactoring 
would simplify the code?

Dropping append also might be used as a cudgel against hdfs.  Cynically 
speaking, do we want to risk marketeers from certain competitors to say or 
imply:  Trust your data with us because we're so brilliant that we have a 
feature hdfs has repeatedly tried and failed to implement!

Daryn

Reply via email to