Hey, Todd.
Eeeh, it was all my mistake. I set TTL of 60 days on data that had all
timestamps fixed way in the past. Warnings were just side-effects of
HBase working hard on what I told it to do. Silly.
Silver lining - your hint about hbase.rootdir fixed my hbck. :) Thanks.
Sorry for wasting your time. :(
T.
On 12/20/2011 09:25 PM, Todd Lipcon wrote:
Hi Tomaz,
That's fishy - are you sure the client you're running hbck from has
hbase.rootdir properly set? It needs to have the hbase configuration
dir on its classpath. (it's possible since you're using SCM that the
client isn't getting this configuration exported)
-Todd
On Tue, Dec 20, 2011 at 12:21 PM, Tomaz Logar<[email protected]> wrote:
Hej, Todd.
The relevant clipping is:
---
Version: 0.90.4-cdh3u2
..........
Number of Tables: 15
Number of live region servers: 8
Number of dead region servers: 0
.ERROR: Version file does not exist in root dir file:/tmp/hbase-ta/hbase
Number of empty REGIONINFO_QUALIFIER rows in .META.: 0
ERROR: Region xxx found in META, but not in HDFS, and deployed on nx (times
48)
Summary:
table is okay.
Number of regions: 48
Deployed on: n1, n2, ... n8
---
Summary says the table in question is ok.
For every region I get "found in META, but not in HDFS", but that seems a
false positive as it is reported for all of them (11k+) and other tables
work ok. And the files are in HDFS, ofcourse.
No mention of any specific region being broken... :(
T.
Dne 20.12.2011 18:56, piše Todd Lipcon:
Hi Tomaz,
What does "hbase hbck" report? Maybe you have a broken region of sorts?
-Todd
On Tue, Dec 20, 2011 at 9:45 AM, Tomaz Logar<[email protected]>
wrote:
Hello, everybody.
I hit a strange snag in HBase today. I have a table with 48 regions
spread
over 8 regionservers. It grows by about one region per day. It's like 6M
small (30-100 bytes each) records at the moment, 3.2G of Snappy-encoded
data
on disks.
What happened is that suddenly I can't scan over any previously inserted
data in just one table. Freshly put data seems to be ok:
---
hbase(main):035:0> put 'table', "\x00TEST", "*:t", "TEST"
0 row(s) in 0.0300 seconds
hbase(main):041:0* scan 'table', {STARTROW=>"\x00TEST", LIMIT=>2}
ROW COLUMN+CELL
\x00TEST column=*:t, timestamp=1324392041600, value=TEST
ERROR: java.lang.RuntimeException:
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-1785731371547934030' does not exist
---
So scan gets the record I put just before, but times out on old record
that
comes right after it. :(
If I target an old record I don't even get an exception, just a huge
timeout, no exception in regionserver log either:
---
hbase(main):049:0> scan 'table', {STARTROW=>"0ua", LIMIT=>1}
ROW COLUMN+CELL
0 row(s) in 146.2210 seconds
---
It may be relevant that I'm getting these on another, much bigger (3T
Snappy, 7+B records), yet working table:
---
11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server Responder, call
next(-15185895745499515, 1) from 192.168.32.192:64307: output error
11/12/20 17:50:37 WARN ipc.HBaseServer: IPC Server handler 5 on 60020
caught: java.nio.channels.ClosedChannelException
11/12/20 17:32:43 WARN snappy.LoadSnappy: Snappy native library is
available
---
But these scans seem to recover while map-reducing.
I'm running hbase-0.90.4-cdh3u2 from Cloudera SCM bundle on mixed nodes
(5 *
2 core 4G RAM, 3 * 12 core 16G RAM) with 1.5G RAM allocated for each
HBase
regionserver.
Can anyone share some wisdom? Anyone got a similar half-broken problem
solved before?
Thanks,
T.