Thanks Devaraj, We have waited a couple hours. We are waiting for the next event to get more details. Should not be long.
Memory so far has not been a problem, we allocate 10GB to each regionserver and usage tends to peak around 1GB form 350mb when idle. The region load is quite small, only 11 small ~3mb regions per server. The servers themselves are decent, new 12 core/12 spindle boxes with 128GB of RAM running CentOS 6.5. -chris On May 12, 2014, at 6:40 PM, Devaraj Das <d...@hortonworks.com> wrote: > How much time do you wait for the RegionServers to come back? Seems > many handler are busy processing GETs and DELETEs. I don't think that > 60 handlers is high if you have decent memory in the regionserver (how > much are they running with, could they be GC'ing a lot leading to > unresponsiveness?). > > On Mon, May 12, 2014 at 5:08 PM, Christopher Tarnas > <c...@biotiquesystems.com> wrote: >> Hi Jeffery, >> >> Thank you, I don't believe we changed the number of handlers from the >> default but we'll double check. What preceded the most recent event (not for >> the earlier stacktrace we just sent) was the developers issuing some "delete >> *" statements for several tables. >> >> -chris >> >>> On May 12, 2014, at 3:32 PM, Jeffrey Zhong <jzh...@hortonworks.com> wrote: >>> >>> >>> From the stack, it seems you increase the default rpc handler number to >>> about 60. All handlers are serving Get request(You can search >>> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2 >>> 841). >>> >>> You can check why there are so many get requests by adding some log info >>> or enable hbase rpc trace. I guess if you decrease the number of rpc >>> handlers per region server, it will mitigate your current issue. >>> >>> >>>> On 5/12/14 2:28 PM, "Chris Tarnas" <c...@biotiquesystems.com> wrote: >>>> >>>> We have hit a problem with Phoenix and regionservers CPU usage spiking up >>>> to use all available CPU and becoming unresponsive. >>>> >>>> After HDP 2.1 was released we setup a 4 compute node cluster (with 3 >>>> VMWare "master" nodes) to test out Phoenix on it. It is a plain Ambari >>>> 1.5/HDP 2.1 install and we added the HDP Phoenix RPM release and hand >>>> linked in the jar files to the hadoop lib. Everything was going well and >>>> we were able to load in ~30k records into several tables. What happened >>>> was after about 3-4 days of being up the regionservers became >>>> unresponsive and started to use most of the available CPU (12 core >>>> boxes). Nothing terribly informative was in the logs (initially we saw >>>> some flush messages that seemed excessive, but that was not all of the >>>> time and we changed back to the standard HBase WAL codec). We are able to >>>> kill the unresponsive regionservers and then restart them, the cluster >>>> will be fine for a day or so but will start to lock up again. >>>> >>>> We've dropped the entire HBase and zookeper information and started from >>>> scratch, but that has not helped. >>>> >>>> James Taylor suggested I send this off here. I've attached a jstack >>>> report of a locked up regionserver in hopes that someone can shed some >>>> light. >>>> >>>> thanks, >>>> -chris >>> >>> >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.