I want to talk about sync() in HDFS for a bit... I had a cluster crash, OOMEs out the butt, 17/19 machines were dead when I got to the scene.
What I found was in .META. there were 2-3x as many regions as were actually on disk. Tons of older entries from parent splits. Looks like a bunch of updates and deletes weren't persisted. And by a bunch, I mean a SHIT TON. It was insane. I had to write HbaseFsck.java as an experiment to recover without rm -rf /hbase So, what will be in hadoop-0.20 to minimize this kind of horrible data loss? Is this the 'sync()' call that is on-again-off-again reliable? What about append? Do we really need append? Syncing an open file to persist data is good enough, no? -ryan On Thu, Apr 2, 2009 at 5:34 PM, Jim Kellerman (POWERSET) < [email protected]> wrote: > > -----Original Message----- > > From: Erik Holstad [mailto:[email protected]] > > Sent: Thursday, April 02, 2009 5:09 PM > > To: [email protected] > > Subject: Re: thinking about hbase 0.20 > > > > So the way I see it, from our point of view, we can probably get 0.20 out > > the door a week after that meeting, so maybe a week and a half after > Stack > > gets back. > > We still have to wait for hadoop-0.20 which has no release candidate yet. > However pushing tasks out is still a good idea so that we can spend the > time between hadoop-0.20 release candidate and hbase-0.20 fixing issues > which I'm certain we will find. All in all this should result in a more > timely and stable release for hbase-0.20. > > -Jim >
