Re: region server problem

stack Wed, 08 Oct 2008 14:30:23 -0700

You should update to 0.2.1 if you can. Make sure you've upped your filedescriptors too: See http://wiki.apache.org/hadoop/Hbase/FAQ#6. Alsosee how to enable DEBUG in same FAQ.

Something odd is up when you see messages like this out of HDFS: ': Nolive nodes contain current block*'. Thats lost data.

Or messages like this, 'compaction completed on regionsearch1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. thatcompactions are taking so long -- would seem to indicate your machinesare severly overloaded or underpowered or both. Can you study load whenthe upload is running on these machines? Perhaps try throttling backto see if hbase survives longer?

The regionserver will output thread dump in its RPC layer if criticalerror -- OOME -- or its been hung up for a long time IIRC.

Check the '.out' logs too for you hbase install to see if they containany errors. Grep the datanode logs too for OOME or "too many open filehandles".


St.Ack

Rui Xing wrote:

Hi All,

1). We are doing performance testing on hbase. The environment of the
testing is 3 data nodes, and 1 name node distributed on 4 machines. We
started one region server on each data node respectively. To insert the
data, one insertion client is started on each data node machine. But as the
data inserted, the region servers crashed one by one. One of the reasons is
listed as follows:

*==>
2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
while reading from blk_-806310822584979460 of
/hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*

... ...

*2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-806310822584979460 from any node:  java.io.IOExceptionYou
2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region search1,r3_1_3_c157476,1223360357528 in
18mins, 39sec
2008-10-07 14:52:25,238 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0.0.0.0:60020.compactor exiting
2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed search1,r3_1_3_c157476,1223360357528
2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed -ROOT-,,0
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
10.2.6.104:60020
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
0.0.0.0:60020 exiting
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
===<

2). Another question is, under what circunstance will the region server
print logs of the thread information as below? It appears among the normal
log records.
===>
35 active threads
Thread 1281 (IPC Client connection to d3v1.corp.alimama.com/10.2.6.101:54310
):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.util.Hashtable.remove(Hashtable.java:435)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
... ...
===<

We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if any
clues can be dropped.

Regards,
-Ray

Re: region server problem

Reply via email to