On Fri, Jan 14, 2011 at 10:25 AM, Eric Baldeschwieler <[email protected]> wrote: > 2) append is hard. It is so hard we rewrote the entire write pipeline (5 > person-years work) in trunk after giving up on the codeline you are > suggesting we merge in. That work is what distinguishes all post 20 releases > from 20 releases in my mind. I dont trust the 20 append code line. We've been > hurt badly by it. We did the rewrite only after losing a bunch of production > data a bunch of times with the previous code line. I think the various 20 > append patch lines may be fine for specialized hbase clusters, but they > doesn't have the rigor behind them to bet your business in them. >
Eric: A few comments on the above: + Append has had a bunch of work done on it since the Y! dataloss of a few years ago on an ancestor of the branch-0.20-append codebase (IIRC the issue you refer to in particular -- the 'dataloss' because partially written blocks were done up in tmp dirs, and on cluster restart, tmp data was cleared -- has been fixed in branch-0.20.append). + You may not trust 0.20-append (or its close cousin over in CDH) but a bunch of HBasers do. On the one hand, we have little choice. Until the *new* append becomes available in a stable Hadoop the HBase project has had to sustain itself (What you think?, 3-6 months before we see 0.22? HBase project can't hold its breath that long). On other hand, the branch-0.20-append work has been carried out by lads (and lasses!) who know their HDFS. Its true that it will not have been tested with Y! rigor but near-derivatives -- CDH or the FB branches -- already do HDFS-200-based append in production. St.Ack P.S. Don't get me wrong. HBase is looking forward to *new* append. We just need something to suck on meantime.
