Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1

Jeffrey Zhong Fri, 16 May 2014 00:30:07 -0700

Hey Chris,

I used performance.py tool which created a table with 50K rows in one
table, run the following query from sqlline.py and everything seems fine
without seeing CPU running hot.


0: jdbc:phoenix:hor11n21.gq1.ygridcore.net> select count(*) from
PERFORMANCE_50000;
+------------+
|  COUNT(1)  |
+------------+
| 50000      |
+------------+
1 row selected (0.166 seconds)
0: jdbc:phoenix:hor11n21.gq1.ygridcore.net> select count(*) from
PERFORMANCE_50000;
+------------+
|  COUNT(1)  |
+------------+
| 50000      |
+------------+
1 row selected (0.167 seconds)

Is there anyway could you run profiler to see where the CPU goes?



On 5/13/14 6:40 PM, "Chris Tarnas" <c...@biotiquesystems.com> wrote:

>Ahh, yes. Here is a pastebin for it:
>
>http://pastebin.com/w6mtabag
>
>thanks again,
>-chris
>
>On May 13, 2014, at 7:47 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:
>
>> Hi Chris,
>> 
>> Attachments are filtered out by the mail server. Can you pastebin it
>>some
>> place?
>> 
>> Thanks,
>> Nick
>> 
>> 
>> On Tue, May 13, 2014 at 2:56 PM, Chris Tarnas
>><c...@biotiquesystems.com>wrote:
>> 
>>> Hello,
>>> 
>>> We set the HBase RegionServer Handler to 10 (it appears to have been
>>>set
>>> to 60 by Ambari during install process). Now we have narrowed down what
>>> causes the CPU to increase and have some detailed logs:
>>> 
>>> If we connect using sqlline.py and execute a select that selects one
>>>row
>>> using the primary_key, no increate in CPU is observed and the number
>>>of RPC
>>> threads in a RUNNABLE state remains the same.
>>> 
>>> If we execute a select that scans the table such as "select count(*)
>>>from
>>> TABLE" or where the "where" clause only limits on non-primary key
>>> attributes, then the number of RUNNABLE RpcServer.handler threads
>>>increases
>>> and the CPU utilization of the regionserver increases by ~105%.
>>> 
>>> Disconnecting the client does not have an effect and the
>>>RpcServer.handler
>>> thread is left RUNNABLE and the CPU stays at the higher usage.
>>> 
>>> Checking the Web Console for the Regionserver just shows 10
>>> RpcServer.reader tasks, all in a WAITING state, no other monitored
>>>tasks
>>> are happening. The regionserver has a Max Heap of 10G and a Used heap
>>>of
>>> 445.2M.
>>> 
>>> I've attached the regionserver log with IPC debug logging turned on
>>>right
>>> when one of the Phoenix statements is executed (this statement actually
>>> used up the last available handler).
>>> 
>>> thanks,
>>> -chris
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On May 12, 2014, at 5:32 PM, Jeffrey Zhong <jzh...@hortonworks.com>
>>>wrote:
>>> 
>>>> 
>>>> From the stack, it seems you increase the default rpc handler number
>>>>to
>>>> about 60. All handlers are serving Get request(You can search
>>>> 
>>> 
>>>org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.jav
>>>a:2
>>>> 841).
>>>> 
>>>> You can check why there are so many get requests by adding some log
>>>>info
>>>> or enable hbase rpc trace. I guess if you decrease the number of rpc
>>>> handlers per region server, it will mitigate your current issue.
>>>> 
>>>> 
>>>> On 5/12/14 2:28 PM, "Chris Tarnas" <c...@biotiquesystems.com> wrote:
>>>> 
>>>>> We have hit a problem with Phoenix and regionservers CPU usage
>>>>>spiking
>>> up
>>>>> to use all available CPU and becoming unresponsive.
>>>>> 
>>>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3
>>>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain
>>>>>Ambari
>>>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand
>>>>> linked in the jar files to the hadoop lib. Everything was going well
>>>>>and
>>>>> we were able to load in ~30k records into several tables. What
>>>>>happened
>>>>> was after about 3-4 days of being up the regionservers became
>>>>> unresponsive and started to use most of the available CPU (12 core
>>>>> boxes). Nothing terribly informative was in the logs (initially we
>>>>>saw
>>>>> some flush messages that seemed excessive, but that was not all of
>>>>>the
>>>>> time and we changed back to the standard HBase WAL codec). We are
>>>>>able
>>> to
>>>>> kill the unresponsive regionservers and then restart them, the
>>>>>cluster
>>>>> will be fine for a day or so but will start to lock up again.
>>>>> 
>>>>> We've dropped the entire HBase and zookeper information and started
>>>>>from
>>>>> scratch, but that has not helped.
>>>>> 
>>>>> James Taylor suggested I send this off here. I've attached a jstack
>>>>> report of a locked up regionserver in hopes that someone can shed
>>>>>some
>>>>> light.
>>>>> 
>>>>> thanks,
>>>>> -chris
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> CONFIDENTIALITY NOTICE
>>>> NOTICE: This message is intended for the use of the individual or
>>>>entity
>>> to
>>>> which it is addressed and may contain information that is
>>>>confidential,
>>>> privileged and exempt from disclosure under applicable law. If the
>>>>reader
>>>> of this message is not the intended recipient, you are hereby notified
>>> that
>>>> any printing, copying, dissemination, distribution, disclosure or
>>>> forwarding of this communication is strictly prohibited. If you have
>>>> received this communication in error, please contact the sender
>>> immediately
>>>> and delete it from your system. Thank You.
>>> 
>>> 
>>> 



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Regionserver burns CPU and stops responding to RPC calls on HDP 2.1

Reply via email to