Try this... 1 run hadoop fsck / 2 shut down hbase 3 mv /hbase to /hbase.old 4 restart /hbase (optional just for a sanity check) 5 copy /hbase.old back to /hbase 6 restart
This may not help, but it can't hurt. Depending on the size of your hbase database, it could take a while. On our sandbox, we suffer from zookeeper and hbase failures when there's a heavy load on the network. (Don't ask, the sandbox was just a play area on whatever hardware we could find.) Doing this copy cleaned up a database that wouldn't fully come up. May do the same for you. HTH -Mike > Date: Wed, 17 Feb 2010 10:50:59 -0500 > Subject: Re: hbase shell count crashes > From: bmdevelopm...@gmail.com > To: hbase-user@hadoop.apache.org > > Hi, > So after a few more attempts and crashes from trying the shell count, > I ran the MR rowcounter and noticed that the number of rows were less > than they should have been - even on smaller test tables. > This led me to start looking through the logs and perform a few > compacts on META and restarts of hbase. Unfortunately, now two tables > are entirely missing - no longer show up under the shell list command. > > I'm not entirely sure what to look for in the logs, but I've noticed a > lot of this in the master log. > > 2010-02-16 23:59:25,856 WARN org.apache.hadoop.hbase.master.HMaster: > info:regioninfo is empty for row: > UserData_0209,e834d76faddee14b,1266316478685; has keys: info:server, > info:serverstartcode > > Came across this in the regionserver log: > 2010-02-16 23:58:33,851 WARN > org.apache.hadoop.hbase.regionserver.Store: Skipping > hdfs://upp1.bmeu.com:50001/hbase/.META./1028785192/info/4080287239754005013 > because its empty. HBASE-646 DATA LOSS? > > Any ideas if the tables are recoverable? Its not a big deal for me to > re-insert from scratch as this is still in testing phase, > but would be curious to find out what has led to these issues in order > to possibly fix or at least not repeat. > > Thanks > > On Tue, Feb 16, 2010 at 2:43 PM, Bluemetrix Development > <bmdevelopm...@gmail.com> wrote: > > Hi, Thanks for the explanation. > > > > Yes, I was able to cat the file from all three of my region servers: > > hadoop fs -cat /hbase/.META./1028785192/info/8254845156484129698 > tmp.out > > > > I have never came across this before, but this is the first time I've > > had 7M rows in the db. > > Is there anything going on that would bog down the network and cause > > this file to be unreachable? > > > > I have 3 servers. The master is running the jobtracker, namenode and > > hmaster. > > And all 3 are running datanodes, regionservers and zookeeper. > > > > Appreciate the help. > > > > On Tue, Feb 16, 2010 at 2:11 PM, Jean-Daniel Cryans <jdcry...@apache.org> > > wrote: > >> This line > >> java.io.IOException: java.io.IOException: Could not obtain block: > >> blk_-6288142015045035704_88516 > >> file=/hbase/.META./1028785192/info/8254845156484129698 > >> > >> Means that the region server wasn't able to fetch a block for the .META. > >> table (the table where all region addresses are). Are you able to open that > >> file using the bin/hadoop command line utility? > >> > >> J-D > >> > >> On Tue, Feb 16, 2010 at 11:08 AM, Bluemetrix Development < > >> bmdevelopm...@gmail.com> wrote: > >> > >>> Hi, > >>> I'm currently trying to run a count in hbase shell and it crashes > >>> right towards the end. > >>> This is turn seems to crash hbase or at least causes the regionservers > >>> to become unavailable. > >>> > >>> Here's the tail end of the count output: > >>> http://pastebin.com/m465346d0 > >>> > >>> I'm on version 0.20.2 and running this command: > >>> > count 'table', 1000000 > >>> > >>> Anyone with similar issues or ideas on this? > >>> Please let me know if you need further info. > >>> Thanks > >>> > >> > > _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/201469227/direct/01/