Hi Tomaz, That's fishy - are you sure the client you're running hbck from has hbase.rootdir properly set? It needs to have the hbase configuration dir on its classpath. (it's possible since you're using SCM that the client isn't getting this configuration exported)
-Todd On Tue, Dec 20, 2011 at 12:21 PM, Tomaz Logar <[email protected]> wrote: > > Hej, Todd. > > The relevant clipping is: > --- > Version: 0.90.4-cdh3u2 > .......... > Number of Tables: 15 > Number of live region servers: 8 > Number of dead region servers: 0 > .ERROR: Version file does not exist in root dir file:/tmp/hbase-ta/hbase > Number of empty REGIONINFO_QUALIFIER rows in .META.: 0 > > ERROR: Region xxx found in META, but not in HDFS, and deployed on nx (times > 48) > > Summary: > table is okay. > Number of regions: 48 > Deployed on: n1, n2, ... n8 > --- > > Summary says the table in question is ok. > > For every region I get "found in META, but not in HDFS", but that seems a > false positive as it is reported for all of them (11k+) and other tables > work ok. And the files are in HDFS, ofcourse. > > No mention of any specific region being broken... :( > > > T. > > Dne 20.12.2011 18:56, piše Todd Lipcon: > >> Hi Tomaz, >> >> What does "hbase hbck" report? Maybe you have a broken region of sorts? >> >> -Todd >> >> On Tue, Dec 20, 2011 at 9:45 AM, Tomaz Logar<[email protected]> >> wrote: >>> >>> Hello, everybody. >>> >>> I hit a strange snag in HBase today. I have a table with 48 regions >>> spread >>> over 8 regionservers. It grows by about one region per day. It's like 6M >>> small (30-100 bytes each) records at the moment, 3.2G of Snappy-encoded >>> data >>> on disks. >>> >>> What happened is that suddenly I can't scan over any previously inserted >>> data in just one table. Freshly put data seems to be ok: >>> >>> --- >>> hbase(main):035:0> put 'table', "\x00TEST", "*:t", "TEST" >>> 0 row(s) in 0.0300 seconds >>> >>> hbase(main):041:0* scan 'table', {STARTROW=>"\x00TEST", LIMIT=>2} >>> ROW COLUMN+CELL >>> \x00TEST column=*:t, timestamp=1324392041600, value=TEST >>> ERROR: java.lang.RuntimeException: >>> org.apache.hadoop.hbase.regionserver.LeaseException: >>> org.apache.hadoop.hbase.regionserver.LeaseException: lease >>> '-1785731371547934030' does not exist >>> --- >>> >>> So scan gets the record I put just before, but times out on old record >>> that >>> comes right after it. :( >>> >>> If I target an old record I don't even get an exception, just a huge >>> timeout, no exception in regionserver log either: >>> --- >>> hbase(main):049:0> scan 'table', {STARTROW=>"0ua", LIMIT=>1} >>> ROW COLUMN+CELL >>> 0 row(s) in 146.2210 seconds >>> --- >>> >>> It may be relevant that I'm getting these on another, much bigger (3T >>> Snappy, 7+B records), yet working table: >>> --- >>> 11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server Responder, call >>> next(-15185895745499515, 1) from 192.168.32.192:64307: output error >>> 11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server handler 5 on 60020 >>> caught: java.nio.channels.ClosedChannelException >>> 11/12/20 17:32:43 WARN snappy.LoadSnappy: Snappy native library is >>> available >>> --- >>> But these scans seem to recover while map-reducing. >>> >>> I'm running hbase-0.90.4-cdh3u2 from Cloudera SCM bundle on mixed nodes >>> (5 * >>> 2 core 4G RAM, 3 * 12 core 16G RAM) with 1.5G RAM allocated for each >>> HBase >>> regionserver. >>> >>> >>> Can anyone share some wisdom? Anyone got a similar half-broken problem >>> solved before? >>> >>> >>> Thanks, >>> >>> T. >>> >>> >> >> > -- Todd Lipcon Software Engineer, Cloudera
