Can you provide more information about the hardware on your nodes?

I think I saw you had 13 mappers running? And only 512MB of heap for the regionserver? That is a very small amount of heap for HBase to run.

How many cores do you have and what are your disks like? 13 mappers per node is WAY too high. If you have 4 core nodes and you are running the regionserver with the datanode and also trying to get MR tasks on the same nodes, you should probably not go over 2 concurrent mappers per node.

HBase and Hadoop will try to do things as fast as possible, but if you don't get them sufficient resources and spin up heavy amounts of load, you'll start to see weird behavior.

As far as robustness of HBase w.r.t. missing blocks, there's not much HBase can do. HBase uses HDFS as it's persistent storage. If the blocks are not available, then your data is simply not available. This would be the same for any database on any filesystem, if the filesystem says the file doesn't exist, the database can't do much.

It does seem that your issues are from too much load on insufficient resources. And I would not expect 0.20 to behave better in that respect, as it now uses the CMS garbage collector and is a bit more resource hungry than it's predecessors (more resources, but far better performance).

JG

stack wrote:
On Tue, Aug 18, 2009 at 10:07 AM, stephen mulcahy
<[email protected]>wrote:

If you can shutdown hbase, then 3., is for sure the way to go -- its
complete and runs quickest.  I'm surprised though that it would complain
of
missing blocks when fsck does not.

Yeah, I was too. I figured a clean bill of health from fsck was good enough
- but it looks like it missed something. Does it seem likely my hbase is
somehow corrupt or is it robust enough to tolerate those missing blocks?
Running a count on my old and new hbase - it looks like my new hbase (from
the backup) has slightly less rows ... but is much much faster.

Is there a hbase fsck or verification process?


We are working on it (HBASE-7).  Meantime, try the rowcounter MR program.
It reads all rows in your table.  If a problem, it'll fail: ./bin/hadoop jar
hbase-X.X.X.jar rowcounter



 Can we get you to migrate to 0.20.0?
I plan to. But I wasn't clear it was safe to yet - is it? :)


Its as safe -- safer -- than 0.19.x.

Let us know if you need a bit of help with migration.
St.Ack

Reply via email to