Re: Should I pass on HBase for this project? (for now)

Daniel Leffel Mon, 19 May 2008 14:45:55 -0700

Hi,
Planning on being there (short of any fires).

Things going relatively well. I've learned quite a bit about massaging the
setup during high load (write-intensive) jobs. Everything seems to be
working well now and HBase is going to be a piviotal part of my project.


See you tomorrow!

Danny


On Mon, May 19, 2008 at 2:17 PM, stack <[EMAIL PROTECTED]> wrote:

> Hey Daniel.  How are things going over there?  You coming to the user group
> meeting tomorrow evening?
> St.Ack
>
>
> Daniel Leffel wrote:
>
>> Hi All (and St.Ack),
>> I've spent the last few weeks figuring out how to use HBase for my
>> project.
>> HBase at it's surface has seemed like the dream solution for this project
>> and had me very excited from the beginning.
>>
>> However, from the moment I've begun to implement the project, I've had a
>> frustrating go at it. I've spent weeks just simply trying to construct the
>> environment under which my application will need to run. I've sent
>> countless
>> messages to this group (and thank you all so much for answering so many of
>> them, especially St.Ack).
>>
>> At this point, I can't seem to tell which one(s) of the following is true:
>>
>>   - Maybe I'm just a freaking idiot
>>   - Maybe HBase is just not equipped to do what I want it to do
>>   - Maybe HBase is just still too unstable and it will do what I need it
>> to
>>   do at some point in the future
>>   - Maybe I have the wrong expectations for the amount of hardware I need
>>   to throw at the situation.
>>
>> I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them
>> running MapRed). I'm running HBase 0.1.2 (most recent release candidate)
>> with the master running on the same box as namenode and 3 region servers
>> (running on the same MapRed boxes).
>>
>> My first and very simple task is to load a sparce table with 220 million
>> rows. The average row has 2 columns or so (very low byte count per row). I
>> have attempted to do this with a simple MapReduce job. In the Map phase,
>> I'm
>> simply parsing through a text file and using the standard TableReduce to
>> load the table.
>>
>> I've attempted to do this with various numbers of reduce tasks and various
>> configurations of which machines run each dameon.
>>
>> The end result is always the same. At some point, Regionservers go offline
>> -
>> the most recent behavior is that region servers just quit responding and
>> logs set to debug give no useful information. If I had to guess, this was
>> typical deadlock behavior.
>>
>> A simple table scan (just so I can find out how rows were successfully
>> inserted before all the region servers died) usually causes the same
>> behavior (one by one, region servers just die - even with no MapRed jobs
>> running).
>>
>> At this point, I'm at a crossroads and beginning to think that I will need
>> to leave HBase behind because I can't spend another week with no progress
>> on
>> this project.
>>
>> So, I ask the question(s) I posed in the beginning.
>>
>>   - Maybe I'm just a freaking idiot
>>   - Maybe HBase is just not equipped to do what I want it to do
>>   - Maybe HBase is just still too unstable and it will do what I need it
>> to
>>   do at some point in the future
>>   - Maybe I have the wrong expectations for the amount of hardware I need
>>   to throw at the situation.
>>
>> Can someone please point me in the right direction?
>>
>> Danny
>>
>>
>>
>
>

Re: Should I pass on HBase for this project? (for now)

Reply via email to