Re: When does a row become highly available?

Seth Ladd Fri, 11 Dec 2009 16:32:15 -0800

Thanks for the tip!  If 0.21 has the "durable rows" fix, then I'd love
to try it.  Without, it'll be hard to recommend HBase (though,
everything else so far is a great fit)


I'm using the bundled ec2 scripts, which default to existing AMIs.
This will be a good opportunity to try to build AMI's myself using the
bundled scripts.

Thanks again, once I have it up and tested I'll let the list know how it goes,
Seth

On Fri, Dec 11, 2009 at 1:21 PM, Jean-Daniel Cryans <[email protected]> wrote:
> You can already do it if you want. The Hadoop's 0.21 branch is pretty
> stable since they are QA'ing it at the moment. HBase's trunk (which
> will become 0.21) is also quite stable so you could easily do the same
> test.
>
> J-D
>
> On Fri, Dec 11, 2009 at 1:16 PM, Seth Ladd <[email protected]> wrote:
>> Thanks for the open and informative reply. Looking forward to testing 0.21
>> when available!
>>
>> On Dec 11, 2009, at 11:36 AM, Andrew Purtell <[email protected]> wrote:
>>
>>> Currently HDFS does not guarantee that a write is fully replicated before
>>> a sync() call completes. The problem is the write appears to complete from
>>> the client's perspective -- HBase completes the write RPC -- but really it
>>> should be blocked for some further period of time. The client won't get a
>>> failure indication when instead it should so it can know it must retry the
>>> write. There are configuration options which can narrow this window but
>>> until HDFS has a working sync() not close it shut tight.
>>>
>>> HBase is a "special" client of HDFS in many respects, so while this is
>>> obviously really important for us, it is not so for the majority of HDFS
>>> users which run mapreduce jobs on it. HDFS level failures leading to data
>>> loss result in task retries and recreation of any temporary data lost, no
>>> harm done. So it has been some time coming. Getting a working sync() in
>>> Hadoop 0.21 is finally going to happen for us.
>>>
>>>  - Andy
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Jean-Daniel Cryans <[email protected]>
>>> To: [email protected]
>>> Sent: Fri, December 11, 2009 10:59:55 AM
>>> Subject: Re: When does a row become highly available?
>>>
>>> That's the not so working HDFS append feature showing it's ugly face,
>>> small amounts of data can be lost (configurable max of ~62MB).
>>>
>>> J-D
>>>
>>> On Fri, Dec 11, 2009 at 10:55 AM, Seth Ladd <[email protected]> wrote:
>>>>>>
>>>>>> Which confuses me, if the write goes straight to a RegionServer, but
>>>>>> then the RegionServer fails before the MemStore is flushed, did I just
>>>>>> lose data?
>>>>>
>>>>> No that's the goal of the write-ahead-log (WAL).
>>>>
>>>> Here's the scenario I just tested on my EC2 cluster.  3 Zookeeper
>>>> instances, 1 master, and 3 slaves.
>>>>
>>>> I created a table, and inserted a single row.
>>>> I performed a read (get) to test the insert, and sure enough the row
>>>> was returned.
>>>> I then noted which slave held the table, and terminated the slave via
>>>> the AWS management console.
>>>> I then waited approx 30 seconds.
>>>> I used the web interfaces (port 60030 and 60010) to note that the
>>>> region was indeed moved to another slave.
>>>> I performed a read on the same row, but did *not* find the row.
>>>>
>>>> So it looks like the region for the table was moved, but no data was
>>>> moved over.
>>>>
>>>> Was that a valid test?  I would expect the row to get moved with the
>>>> region.
>>>>
>>>> Thanks,
>>>> Seth
>>>>
>>>
>>>
>>>
>>
>

Re: When does a row become highly available?

Reply via email to