Can you provide more information about the hardware on your nodes?
I think I saw you had 13 mappers running? And only 512MB of heap for
the regionserver? That is a very small amount of heap for HBase to run.
How many cores do you have and what are your disks like? 13 mappers per
node is WAY too high. If you have 4 core nodes and you are running the
regionserver with the datanode and also trying to get MR tasks on the
same nodes, you should probably not go over 2 concurrent mappers per node.
HBase and Hadoop will try to do things as fast as possible, but if you
don't get them sufficient resources and spin up heavy amounts of load,
you'll start to see weird behavior.
As far as robustness of HBase w.r.t. missing blocks, there's not much
HBase can do. HBase uses HDFS as it's persistent storage. If the
blocks are not available, then your data is simply not available. This
would be the same for any database on any filesystem, if the filesystem
says the file doesn't exist, the database can't do much.
It does seem that your issues are from too much load on insufficient
resources. And I would not expect 0.20 to behave better in that
respect, as it now uses the CMS garbage collector and is a bit more
resource hungry than it's predecessors (more resources, but far better
performance).
JG
stack wrote:
On Tue, Aug 18, 2009 at 10:07 AM, stephen mulcahy
<[email protected]>wrote:
If you can shutdown hbase, then 3., is for sure the way to go -- its
complete and runs quickest. I'm surprised though that it would complain
of
missing blocks when fsck does not.
Yeah, I was too. I figured a clean bill of health from fsck was good enough
- but it looks like it missed something. Does it seem likely my hbase is
somehow corrupt or is it robust enough to tolerate those missing blocks?
Running a count on my old and new hbase - it looks like my new hbase (from
the backup) has slightly less rows ... but is much much faster.
Is there a hbase fsck or verification process?
We are working on it (HBASE-7). Meantime, try the rowcounter MR program.
It reads all rows in your table. If a problem, it'll fail: ./bin/hadoop jar
hbase-X.X.X.jar rowcounter
Can we get you to migrate to 0.20.0?
I plan to. But I wasn't clear it was safe to yet - is it? :)
Its as safe -- safer -- than 0.19.x.
Let us know if you need a bit of help with migration.
St.Ack