You have DEBUG enabled? Can I see log from the regionserver that went
down? Can you tell me more about your cluster? Number of nodes, number
of regions? What your uploader looks like (is it a MR job)? You have
upped your file descriptors?
Thanks Slava.
St.Ack
Slava Gorelik wrote:
HI.I'm also encountering error like this.
I'm using Hbase 0.18.0 an Hadoop 0.18.0.
I addition to this error, i'm getting that sometimes region servers are
died, in the log i see region server shutdown, after starting compaction,
because that some data blocks are not found.
Best Regards.
On Wed, Oct 8, 2008 at 11:29 PM, stack <[EMAIL PROTECTED]> wrote:
You should update to 0.2.1 if you can. Make sure you've upped your file
descriptors too: See http://wiki.apache.org/hadoop/Hbase/FAQ#6. Also see
how to enable DEBUG in same FAQ.
Something odd is up when you see messages like this out of HDFS: ': No live
nodes contain current block*'. Thats lost data.
Or messages like this, 'compaction completed on region
search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
compactions are taking so long -- would seem to indicate your machines are
severly overloaded or underpowered or both. Can you study load when the
upload is running on these machines? Perhaps try throttling back to see if
hbase survives longer?
The regionserver will output thread dump in its RPC layer if critical error
-- OOME -- or its been hung up for a long time IIRC.
Check the '.out' logs too for you hbase install to see if they contain any
errors. Grep the datanode logs too for OOME or "too many open file
handles".
St.Ack
Rui Xing wrote:
Hi All,
1). We are doing performance testing on hbase. The environment of the
testing is 3 data nodes, and 1 name node distributed on 4 machines. We
started one region server on each data node respectively. To insert the
data, one insertion client is started on each data node machine. But as
the
data inserted, the region servers crashed one by one. One of the reasons
is
listed as follows:
*==>
2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
while reading from blk_-806310822584979460 of
/hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
... ...
*2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-806310822584979460 from any node:
java.io.IOExceptionYou
2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region search1,r3_1_3_c157476,1223360357528 in
18mins, 39sec
2008-10-07 14:52:25,238 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0.0.0.0:60020.compactor exiting
2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed search1,r3_1_3_c157476,1223360357528
2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed -ROOT-,,0
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
10.2.6.104:60020
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
0.0.0.0:60020 exiting
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
complete
===<
2). Another question is, under what circunstance will the region server
print logs of the thread information as below? It appears among the normal
log records.
===>
35 active threads
Thread 1281 (IPC Client connection to
d3v1.corp.alimama.com/10.2.6.101:54310
):
State: RUNNABLE
Blocked count: 0
Waited count: 0
Stack:
java.util.Hashtable.remove(Hashtable.java:435)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
... ...
===<
We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if
any
clues can be dropped.
Regards,
-Ray