You should update to 0.2.1 if you can. Make sure you've upped your file descriptors too: See http://wiki.apache.org/hadoop/Hbase/FAQ#6. Also see how to enable DEBUG in same FAQ.

Something odd is up when you see messages like this out of HDFS: ': No live nodes contain current block*'. Thats lost data.

Or messages like this, 'compaction completed on region search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that compactions are taking so long -- would seem to indicate your machines are severly overloaded or underpowered or both. Can you study load when the upload is running on these machines? Perhaps try throttling back to see if hbase survives longer?

The regionserver will output thread dump in its RPC layer if critical error -- OOME -- or its been hung up for a long time IIRC.

Check the '.out' logs too for you hbase install to see if they contain any errors. Grep the datanode logs too for OOME or "too many open file handles".

St.Ack

Rui Xing wrote:
Hi All,

1). We are doing performance testing on hbase. The environment of the
testing is 3 data nodes, and 1 name node distributed on 4 machines. We
started one region server on each data node respectively. To insert the
data, one insertion client is started on each data node machine. But as the
data inserted, the region servers crashed one by one. One of the reasons is
listed as follows:

*==>
2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
while reading from blk_-806310822584979460 of
/hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*

... ...

*2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-806310822584979460 from any node:  java.io.IOExceptionYou
2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region search1,r3_1_3_c157476,1223360357528 in
18mins, 39sec
2008-10-07 14:52:25,238 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0.0.0.0:60020.compactor exiting
2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed search1,r3_1_3_c157476,1223360357528
2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed -ROOT-,,0
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
10.2.6.104:60020
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
0.0.0.0:60020 exiting
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
===<

2). Another question is, under what circunstance will the region server
print logs of the thread information as below? It appears among the normal
log records.
===>
35 active threads
Thread 1281 (IPC Client connection to d3v1.corp.alimama.com/10.2.6.101:54310
):
  State: RUNNABLE
  Blocked count: 0
  Waited count: 0
  Stack:
    java.util.Hashtable.remove(Hashtable.java:435)
    org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
... ...
===<

We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if any
clues can be dropped.

Regards,
-Ray


Reply via email to