At StumbleUpon we have north of 20 billions rows, each of 100-200 bytes. Look in your datanode log for
http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5 or that http://wiki.apache.org/hadoop/Hbase/FAQ#A6 J-D On Wed, Apr 7, 2010 at 9:55 AM, Geoff Hendrey <ghend...@decarta.com> wrote: > Hi, > > I am running an HBase instance in a pseudocluster mode, on top of a > pseudoclustered HDFS, on a single machine. I have a 10 node map/reduce > cluster that is using a TableMapper to drive a map/reduce job. In the > map phase, two Gets are executed against against HBase. The Map phase > generates two orders of magnitude more data than was pumped in, and in > the reduce phase we do some consolidation of the generated data, then > execute a Put into HBase with autocomit=false, and the batch size set to > 100,000 (I tried 1000,10000 as well and found 100,000 worked best). I am > using 32 reducers, and reduce seems to run 1000X slower than mapping. > > Unfortunately, the job consistently crashes around 85% reduce > completion, with HDFS related errors from the HBase machine: > > java.io.IOException: java.io.IOException: All datanodes 127.0.0.1:50010 > are bad. Aborting... > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DF > SClient.java:2525) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.j > ava:2078) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCli > ent.java:2241) > So I am clearly aware of the mismatch betweem the big mapreduce > cluster, and the wimpy HBase installation, but why am I seeing > consistent crashes? Shouldn't the HBase cluster just be slower, not > unreliable? > Here is my main question: should I expect that running a "real" HBase > cluster will solve my problems and does anyone have experience with a > map/reduce job that pumps several billion rows into HBase? > -geoff >