You should update to 0.2.1 if you can. Make sure you've upped your file
descriptors too: See http://wiki.apache.org/hadoop/Hbase/FAQ#6. Also
see how to enable DEBUG in same FAQ.
Something odd is up when you see messages like this out of HDFS: ': No
live nodes contain current block*'. Thats lost data.
Or messages like this, 'compaction completed on region
search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
compactions are taking so long -- would seem to indicate your machines
are severly overloaded or underpowered or both. Can you study load when
the upload is running on these machines? Perhaps try throttling back
to see if hbase survives longer?
The regionserver will output thread dump in its RPC layer if critical
error -- OOME -- or its been hung up for a long time IIRC.
Check the '.out' logs too for you hbase install to see if they contain
any errors. Grep the datanode logs too for OOME or "too many open file
handles".
St.Ack
Rui Xing wrote:
Hi All,
1). We are doing performance testing on hbase. The environment of the
testing is 3 data nodes, and 1 name node distributed on 4 machines. We
started one region server on each data node respectively. To insert the
data, one insertion client is started on each data node machine. But as the
data inserted, the region servers crashed one by one. One of the reasons is
listed as follows:
*==>
2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
while reading from blk_-806310822584979460 of
/hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
... ...
*2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-806310822584979460 from any node: java.io.IOExceptionYou
2008-10-07 14:52:25,229 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region search1,r3_1_3_c157476,1223360357528 in
18mins, 39sec
2008-10-07 14:52:25,238 INFO
org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver/0.0.0.0:60020.compactor exiting
2008-10-07 14:52:25,284 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed search1,r3_1_3_c157476,1223360357528
2008-10-07 14:52:25,291 INFO org.apache.hadoop.hbase.regionserver.HRegion:
closed -ROOT-,,0
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
10.2.6.104:60020
2008-10-07 14:52:25,291 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
0.0.0.0:60020 exiting
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
thread.
2008-10-07 14:52:25,511 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
===<
2). Another question is, under what circunstance will the region server
print logs of the thread information as below? It appears among the normal
log records.
===>
35 active threads
Thread 1281 (IPC Client connection to d3v1.corp.alimama.com/10.2.6.101:54310
):
State: RUNNABLE
Blocked count: 0
Waited count: 0
Stack:
java.util.Hashtable.remove(Hashtable.java:435)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
... ...
===<
We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated if any
clues can be dropped.
Regards,
-Ray