Hi All (and St.Ack), I've spent the last few weeks figuring out how to use HBase for my project. HBase at it's surface has seemed like the dream solution for this project and had me very excited from the beginning.
However, from the moment I've begun to implement the project, I've had a frustrating go at it. I've spent weeks just simply trying to construct the environment under which my application will need to run. I've sent countless messages to this group (and thank you all so much for answering so many of them, especially St.Ack). At this point, I can't seem to tell which one(s) of the following is true: - Maybe I'm just a freaking idiot - Maybe HBase is just not equipped to do what I want it to do - Maybe HBase is just still too unstable and it will do what I need it to do at some point in the future - Maybe I have the wrong expectations for the amount of hardware I need to throw at the situation. I have Hadoop 0.16.3 running on 4 boxes (all 4 running DFS and 3 of them running MapRed). I'm running HBase 0.1.2 (most recent release candidate) with the master running on the same box as namenode and 3 region servers (running on the same MapRed boxes). My first and very simple task is to load a sparce table with 220 million rows. The average row has 2 columns or so (very low byte count per row). I have attempted to do this with a simple MapReduce job. In the Map phase, I'm simply parsing through a text file and using the standard TableReduce to load the table. I've attempted to do this with various numbers of reduce tasks and various configurations of which machines run each dameon. The end result is always the same. At some point, Regionservers go offline - the most recent behavior is that region servers just quit responding and logs set to debug give no useful information. If I had to guess, this was typical deadlock behavior. A simple table scan (just so I can find out how rows were successfully inserted before all the region servers died) usually causes the same behavior (one by one, region servers just die - even with no MapRed jobs running). At this point, I'm at a crossroads and beginning to think that I will need to leave HBase behind because I can't spend another week with no progress on this project. So, I ask the question(s) I posed in the beginning. - Maybe I'm just a freaking idiot - Maybe HBase is just not equipped to do what I want it to do - Maybe HBase is just still too unstable and it will do what I need it to do at some point in the future - Maybe I have the wrong expectations for the amount of hardware I need to throw at the situation. Can someone please point me in the right direction? Danny
