On Fri, Jan 14, 2011 at 10:25 AM, Eric Baldeschwieler
<[email protected]> wrote:
> 2) append is hard. It is so hard we rewrote the entire write pipeline (5 
> person-years work) in trunk after giving up on the codeline you are 
> suggesting we merge in. That work is what distinguishes all post 20 releases 
> from 20 releases in my mind. I dont trust the 20 append code line. We've been 
> hurt badly by it.  We did the rewrite only after losing a bunch of production 
> data a bunch of times with the previous code line.  I think the various 20 
> append patch lines may be fine for specialized hbase clusters, but they 
> doesn't have the rigor behind them to bet your business in them.
>

Eric:

A few comments on the above:

+ Append has had a bunch of work done on it since the Y! dataloss of a
few years ago on an ancestor of the branch-0.20-append codebase (IIRC
the issue you refer to in particular -- the 'dataloss' because
partially written blocks were done up in tmp dirs, and on cluster
restart, tmp data was cleared -- has been fixed in
branch-0.20.append).
+ You may not trust 0.20-append (or its close cousin over in CDH) but
a bunch of HBasers do. On the one hand, we have little choice.  Until
the *new* append becomes available in a stable Hadoop the HBase
project has had to sustain itself (What you think?, 3-6 months before
we see 0.22?  HBase project can't hold its breath that long).  On
other hand, the branch-0.20-append work has been carried out by lads
(and lasses!) who know their HDFS.  Its true that it will not have
been tested with Y! rigor but near-derivatives -- CDH or the FB
branches -- already do HDFS-200-based append in production.

St.Ack
P.S. Don't get me wrong.  HBase is looking forward to *new* append.
We just need something to suck on meantime.

Reply via email to