Re: region server problem

Slava Gorelik Wed, 08 Oct 2008 15:07:52 -0700

Hi.
I'll send log little bit later, with all answers on your questions, but what
do you mean - "You have upped your file descriptors?" ?


Best Regards.


On Wed, Oct 8, 2008 at 11:41 PM, stack <[EMAIL PROTECTED]> wrote:

> You have DEBUG enabled?  Can I see log from the regionserver that went
> down?  Can you tell me more about your cluster? Number of nodes, number of
> regions?  What your uploader looks like (is it a MR job)?  You have upped
> your file descriptors?
>
> Thanks Slava.
> St.Ack
>
>
>
> Slava Gorelik wrote:
>
>> HI.I'm also encountering error like this.
>> I'm using Hbase 0.18.0 an Hadoop 0.18.0.
>> I addition to this error, i'm getting that sometimes region servers are
>> died, in the log i see region server shutdown, after starting compaction,
>> because that some data blocks are not found.
>>
>> Best Regards.
>>
>> On Wed, Oct 8, 2008 at 11:29 PM, stack <[EMAIL PROTECTED]> wrote:
>>
>>
>>
>>> You should update to 0.2.1 if you can.  Make sure you've upped your file
>>> descriptors too:  See http://wiki.apache.org/hadoop/Hbase/FAQ#6.  Also
>>> see
>>> how to enable DEBUG in same FAQ.
>>>
>>> Something odd is up when you see messages like this out of HDFS: ': No
>>> live
>>> nodes contain current block*'.  Thats lost data.
>>>
>>> Or messages like this, 'compaction completed on region
>>> search1,r3_1_3_c157476,1223360357528 in 18mins, 39sec' -- i.e. that
>>> compactions are taking so long -- would seem to indicate your machines
>>> are
>>> severly overloaded or underpowered or both.  Can you study load when the
>>> upload is running on these machines?  Perhaps try  throttling back to see
>>> if
>>> hbase survives longer?
>>>
>>> The regionserver will output thread dump in its RPC layer if critical
>>> error
>>> -- OOME -- or its been hung up for a long time IIRC.
>>>
>>> Check the '.out' logs too for you hbase install to see if they contain
>>> any
>>> errors.  Grep the datanode logs too for OOME or "too many open file
>>> handles".
>>>
>>> St.Ack
>>>
>>> Rui Xing wrote:
>>>
>>>
>>>
>>>> Hi All,
>>>>
>>>> 1). We are doing performance testing on hbase. The environment of the
>>>> testing is 3 data nodes, and 1 name node distributed on 4 machines. We
>>>> started one region server on each data node respectively. To insert the
>>>> data, one insertion client is started on each data node machine. But as
>>>> the
>>>> data inserted, the region servers crashed one by one. One of the reasons
>>>> is
>>>> listed as follows:
>>>>
>>>> *==>
>>>> 2008-10-07 14:47:01,519 WARN org.apache.hadoop.dfs.DFSClient: Exception
>>>> while reading from blk_-806310822584979460 of
>>>> /hbase/search1/1201761134/col9/mapfiles/3578469984425427480/data from
>>>> 10.2.6.102:50010: java.io.IOException: Premeture EOF from inputStream*
>>>>
>>>> ... ...
>>>>
>>>> *2008-10-07 14:47:01,521 INFO org.apache.hadoop.dfs.DFSClient: Could not
>>>> obtain block blk_-806310822584979460 from any node:
>>>>  java.io.IOExceptionYou
>>>>
>>>> 2008-10-07 14:52:25,229 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> compaction completed on region search1,r3_1_3_c157476,1223360357528 in
>>>> 18mins, 39sec
>>>> 2008-10-07 14:52:25,238 INFO
>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>>>> regionserver/0.0.0.0:60020.compactor exiting
>>>> 2008-10-07 14:52:25,284 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> closed search1,r3_1_3_c157476,1223360357528
>>>> 2008-10-07 14:52:25,291 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>> closed -ROOT-,,0
>>>> 2008-10-07 14:52:25,291 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
>>>> 10.2.6.104:60020
>>>> 2008-10-07 14:52:25,291 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>>>> 0.0.0.0:60020 exiting
>>>> 2008-10-07 14:52:25,511 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
>>>> thread.
>>>> 2008-10-07 14:52:25,511 INFO
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
>>>> complete
>>>> ===<
>>>>
>>>> 2). Another question is, under what circunstance will the region server
>>>> print logs of the thread information as below? It appears among the
>>>> normal
>>>> log records.
>>>> ===>
>>>> 35 active threads
>>>> Thread 1281 (IPC Client connection to
>>>> d3v1.corp.alimama.com/10.2.6.101:54310
>>>> ):
>>>>  State: RUNNABLE
>>>>  Blocked count: 0
>>>>  Waited count: 0
>>>>  Stack:
>>>>   java.util.Hashtable.remove(Hashtable.java:435)
>>>>   org.apache.hadoop.ipc.Client$Connection.run(Client.java:297)
>>>> ... ...
>>>> ===<
>>>>
>>>> We use hadoop 0.17.1 and hbase 0.2.0. It would be greatly appreciated
>>>> if
>>>> any
>>>> clues can be dropped.
>>>>
>>>> Regards,
>>>> -Ray
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>

Re: region server problem

Reply via email to