Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1

Chris Tarnas Mon, 12 May 2014 20:11:30 -0700

Thanks Devaraj,

We have waited a couple hours. We are waiting for the next event to get more 
details. Should not be long.


Memory so far has not been a problem, we allocate 10GB to each regionserver and 
usage tends to peak around 1GB form 350mb when idle. The region load is quite 
small, only 11 small ~3mb regions per server.

The servers themselves are decent, new 12 core/12 spindle boxes with 128GB of 
RAM running CentOS 6.5.

-chris

On May 12, 2014, at 6:40 PM, Devaraj Das <d...@hortonworks.com> wrote:

> How much time do you wait for the RegionServers to come back? Seems
> many handler are busy processing GETs and DELETEs. I don't think that
> 60 handlers is high if you have decent memory in the regionserver (how
> much are they running with, could they be GC'ing a lot leading to
> unresponsiveness?).
> 
> On Mon, May 12, 2014 at 5:08 PM, Christopher Tarnas
> <c...@biotiquesystems.com> wrote:
>> Hi Jeffery,
>> 
>> Thank you, I don't believe we changed the number of handlers from the 
>> default but we'll double check. What preceded the most recent event (not for 
>> the earlier stacktrace we just sent) was the developers issuing some "delete 
>> *" statements  for several tables.
>> 
>> -chris
>> 
>>> On May 12, 2014, at 3:32 PM, Jeffrey Zhong <jzh...@hortonworks.com> wrote:
>>> 
>>> 
>>> From the stack, it seems you increase the default rpc handler number to
>>> about 60. All handlers are serving Get request(You can search
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2
>>> 841).
>>> 
>>> You can check why there are so many get requests by adding some log info
>>> or enable hbase rpc trace. I guess if you decrease the number of rpc
>>> handlers per region server, it will mitigate your current issue.
>>> 
>>> 
>>>> On 5/12/14 2:28 PM, "Chris Tarnas" <c...@biotiquesystems.com> wrote:
>>>> 
>>>> We have hit a problem with Phoenix and regionservers CPU usage spiking up
>>>> to use all available CPU and becoming unresponsive.
>>>> 
>>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3
>>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain Ambari
>>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand
>>>> linked in the jar files to the hadoop lib. Everything was going well and
>>>> we were able to load in ~30k records into several tables. What happened
>>>> was after about 3-4 days of being up the regionservers became
>>>> unresponsive and started to use most of the available CPU (12 core
>>>> boxes). Nothing terribly informative was in the logs (initially we saw
>>>> some flush messages that seemed excessive, but that was not all of the
>>>> time and we changed back to the standard HBase WAL codec). We are able to
>>>> kill the unresponsive regionservers and then restart them, the cluster
>>>> will be fine for a day or so but will start to lock up again.
>>>> 
>>>> We've dropped the entire HBase and zookeper information and started from
>>>> scratch, but that has not helped.
>>>> 
>>>> James Taylor suggested I send this off here. I've attached a jstack
>>>> report of a locked up regionserver in hopes that someone can shed some
>>>> light.
>>>> 
>>>> thanks,
>>>> -chris
>>> 
>>> 
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.

Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1

Reply via email to