Re: When does a row become highly available?

Seth Ladd Fri, 11 Dec 2009 13:16:53 -0800

Thanks for the open and informative reply. Looking forward to testing0.21 when available!

On Dec 11, 2009, at 11:36 AM, Andrew Purtell <[email protected]>wrote:

Currently HDFS does not guarantee that a write is fully replicatedbeforea sync() call completes. The problem is the write appears tocomplete fromthe client's perspective -- HBase completes the write RPC -- butreally itshould be blocked for some further period of time. The client won'tget afailure indication when instead it should so it can know it mustretry thewrite. There are configuration options which can narrow this windowbut
until HDFS has a working sync() not close it shut tight.

HBase is a "special" client of HDFS in many respects, so while this is
obviously really important for us, it is not so for the majority ofHDFSusers which run mapreduce jobs on it. HDFS level failures leading todataloss result in task retries and recreation of any temporary datalost, noharm done. So it has been some time coming. Getting a working sync()in
Hadoop 0.21 is finally going to happen for us.

  - Andy





________________________________
From: Jean-Daniel Cryans <[email protected]>
To: [email protected]
Sent: Fri, December 11, 2009 10:59:55 AM
Subject: Re: When does a row become highly available?

That's the not so working HDFS append feature showing it's ugly face,
small amounts of data can be lost (configurable max of ~62MB).

J-D
On Fri, Dec 11, 2009 at 10:55 AM, Seth Ladd <[email protected]>wrote:
Which confuses me, if the write goes straight to a RegionServer,butthen the RegionServer fails before the MemStore is flushed, did Ijust
lose data?
No that's the goal of the write-ahead-log (WAL).
Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
instances, 1 master, and 3 slaves.

I created a table, and inserted a single row.
I performed a read (get) to test the insert, and sure enough the row
was returned.
I then noted which slave held the table, and terminated the slave via
the AWS management console.
I then waited approx 30 seconds.
I used the web interfaces (port 60030 and 60010) to note that the
region was indeed moved to another slave.
I performed a read on the same row, but did *not* find the row.
So it looks like the region for the table was moved, but no datawas moved over.
Was that a valid test? I would expect the row to get moved withthe region.
Thanks,
Seth

Re: When does a row become highly available?

Reply via email to