Hi. I'll send log little bit later, with all answers on your questions, but what do you mean - "You have upped your file descriptors?" ?
Best Regards. On Wed, Oct 8, 2008 at 11:41 PM, stack <[EMAIL PROTECTED]> wrote: > You have DEBUG enabled? Can I see log from the regionserver that went > down? Can you tell me more about your cluster? Number of nodes, number of > regions? What your uploader looks like (is it a MR job)? You have upped > your file descriptors? > > Thanks Slava. > St.Ack > > > > Slava Gorelik wrote: > >> HI.I'm also encountering error like this. >> I'm using Hbase 0.18.0 an Hadoop 0.18.0. >> I addition to this error, i'm getting that sometimes region servers are >> died, in the log i see region server shutdown, after starting compaction, >> because that some data blocks are not found. >> >> Best Regards. >> >> On Wed, Oct 8, 2008 at 11:29 PM, stack <[EMAIL PROTECTED]> wrote: >> >> >> >>> You should update to 0.2.1 if you can. Make sure you've upped your file >>> descriptors too: See http://wiki.apache.org/hadoop/Hbase/FAQ#6. Also >>> see >>> how to enable DEBUG in same FAQ. >>> >>> Something odd is up when you see messages like this out of HDFS: ': No >>> live >>> nodes contain current block*'. Thats lost data. >>> >>> Or messages like this, 'compaction completed on region >>> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that >>> compactions are taking so long -- would seem to indicate your machines >>> are >>> severly overloaded or underpowered or both. Can you study load when the >>> upload is running on these machines? Perhaps try throttling back to see >>> if >>> hbase survives longer? >>> >>> The regionserver will output thread dump in its RPC layer if critical >>> error >>> -- OOME -- or its been hung up for a long time IIRC. >>> >>> Check the '.out' logs too for you hbase install to see if they contain >>> any >>> errors. Grep the datanode logs too for OOME or "too many open file >>> handles". >>> >>> St.Ack >>> >>> Rui Xing wrote: >>> >>> >>> >>>> Hi All, >>>> >>>> 1). We are doing performance testing on hbase. The environment of the >>>> testing is 3 data nodes, and 1 name node distributed on 4 machines. We >>>> started one region server on each data node respectively. To insert the >>>> data, one insertion client is started on each data node machine. But as >>>> the >>>> data inserted, the region servers crashed one by one. One of the reasons >>>> is >>>> listed as follows: >>>> >>>> *==> >>>> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception >>>> while reading from blk_-806310822584979460 of >>>> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from >>>> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream* >>>> >>>> ... ... >>>> >>>> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not >>>> obtain block blk_-806310822584979460 from any node: >>>> java.io.IOExceptionYou >>>> >>>> 2008-10-07 14:52:25,229 INFO >>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>> compaction completed on region search1,r3_1_3_c157476,1223360357528 in >>>> 18mins, 39sec >>>> 2008-10-07 14:52:25,238 INFO >>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: >>>> regionserver/0.0.0.0:60020.compactor exiting >>>> 2008-10-07 14:52:25,284 INFO >>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>> closed search1,r3_1_3_c157476,1223360357528 >>>> 2008-10-07 14:52:25,291 INFO >>>> org.apache.hadoop.hbase.regionserver.HRegion: >>>> closed -ROOT-,,0 >>>> 2008-10-07 14:52:25,291 INFO >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at: >>>> 10.2.6.104:60020 >>>> 2008-10-07 14:52:25,291 INFO >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/ >>>> 0.0.0.0:60020 exiting >>>> 2008-10-07 14:52:25,511 INFO >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown >>>> thread. >>>> 2008-10-07 14:52:25,511 INFO >>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread >>>> complete >>>> ===< >>>> >>>> 2). Another question is, under what circunstance will the region server >>>> print logs of the thread information as below? It appears among the >>>> normal >>>> log records. >>>> ===> >>>> 35 active threads >>>> Thread 1281 (IPC Client connection to >>>> d3v1.corp.alimama.com/10.2.6.101:54310 >>>> ): >>>> State: RUNNABLE >>>> Blocked count: 0 >>>> Waited count: 0 >>>> Stack: >>>> java.util.Hashtable.remove(Hashtable.java:435) >>>> org.apache.hadoop.ipc.Client$Connection.run(Client.java:297) >>>> ... ... >>>> ===< >>>> >>>> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated >>>> if >>>> any >>>> clues can be dropped. >>>> >>>> Regards, >>>> -Ray >>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >> > >
