Hej, Todd.
The relevant clipping is:
---
Version: 0.90.4-cdh3u2
..........
Number of Tables: 15
Number of live region servers: 8
Number of dead region servers: 0
.ERROR: Version file does not exist in root dir file:/tmp/hbase-ta/hbase
Number of empty REGIONINFO_QUALIFIER rows in .META.: 0
ERROR: Region xxx found in META, but not in HDFS, and deployed on nx
(times 48)
Summary:
table is okay.
Number of regions: 48
Deployed on: n1, n2, ... n8
---
Summary says the table in question is ok.
For every region I get "found in META, but not in HDFS", but that seems
a false positive as it is reported for all of them (11k+) and other
tables work ok. And the files are in HDFS, ofcourse.
No mention of any specific region being broken... :(
T.
Dne 20.12.2011 18:56, piše Todd Lipcon:
Hi Tomaz,
What does "hbase hbck" report? Maybe you have a broken region of sorts?
-Todd
On Tue, Dec 20, 2011 at 9:45 AM, Tomaz Logar<[email protected]> wrote:
Hello, everybody.
I hit a strange snag in HBase today. I have a table with 48 regions spread
over 8 regionservers. It grows by about one region per day. It's like 6M
small (30-100 bytes each) records at the moment, 3.2G of Snappy-encoded data
on disks.
What happened is that suddenly I can't scan over any previously inserted
data in just one table. Freshly put data seems to be ok:
---
hbase(main):035:0> put 'table', "\x00TEST", "*:t", "TEST"
0 row(s) in 0.0300 seconds
hbase(main):041:0* scan 'table', {STARTROW=>"\x00TEST", LIMIT=>2}
ROW COLUMN+CELL
\x00TEST column=*:t, timestamp=1324392041600, value=TEST
ERROR: java.lang.RuntimeException:
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-1785731371547934030' does not exist
---
So scan gets the record I put just before, but times out on old record that
comes right after it. :(
If I target an old record I don't even get an exception, just a huge
timeout, no exception in regionserver log either:
---
hbase(main):049:0> scan 'table', {STARTROW=>"0ua", LIMIT=>1}
ROW COLUMN+CELL
0 row(s) in 146.2210 seconds
---
It may be relevant that I'm getting these on another, much bigger (3T
Snappy, 7+B records), yet working table:
---
11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server Responder, call
next(-15185895745499515, 1) from 192.168.32.192:64307: output error
11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server handler 5 on 60020
caught: java.nio.channels.ClosedChannelException
11/12/20 17:32:43 WARN snappy.LoadSnappy: Snappy native library is available
---
But these scans seem to recover while map-reducing.
I'm running hbase-0.90.4-cdh3u2 from Cloudera SCM bundle on mixed nodes (5 *
2 core 4G RAM, 3 * 12 core 16G RAM) with 1.5G RAM allocated for each HBase
regionserver.
Can anyone share some wisdom? Anyone got a similar half-broken problem
solved before?
Thanks,
T.