Re: regionserver disconnection

Zhenyu Zhong Tue, 01 Dec 2009 10:50:26 -0800

The java vm version I am running on is still 1.6.0_11. We are scheduling an
upgrade soon.


The command we use to start the RegionServer is:
/usr/local/jdk1.6.0_11/bin/java -Xmx4096m -XX:+HeapDumpOnOutOfMemoryError
-XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:+UseFastAccessorMethods
-XX:+PrintTenuringDistribution -XX:MaxTenuringThreshold=6
-XX:SurvivorRatio=6 -XX:CMSInitiatingOccupancyFraction=60
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled
-XX:+DisableExplicitGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -Xloggc:/data/hbase_logs/gc-hbase.log
-XX:ErrorFile=/data/hbase_logs/java_err.log -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.password.file=/code/hbase-0.20.1/conf/jmxremote.password
-Dcom.sun.management.jmxremote.access.file=/code/hbase-0.20.1/conf/jmxremote.access
-Dcom.sun.management.jmxremote.port=10102 -Dhbase.log.dir=/data/hbase_logs
-Dhbase.log.file=hbase-hadoop-admin-regionserver-superpyxis0006.log
-Dhbase.home.dir=/code/hbase-0.20.1 -Dhbase.id.str=hadoop-admin
-Dhbase.root.logger=INFO,DRFA
-Djava.library.path=/code/hbase-0.20.1/lib/native/Linux-amd64-64 -classpath
/code/hbase-0.20.1/bin/../conf:/usr/local/jdk1.6.0_11/lib/tools.jar:/code/hbase-0.20.1:/code/hbase-0.20.1/hbase-0.20.1.jar:/code/hbase-0.20.1/hbase-0.20.1-test.jar:/code/hbase-0.20.1/lib/AgileJSON-2009-03-30.jar:/code/hbase-0.20.1/lib/commons-cli-2.0-SNAPSHOT.jar:/code/hbase-0.20.1/lib/commons-el-from-jetty-5.1.4.jar:/code/hbase-0.20.1/lib/commons-httpclient-3.0.1.jar:/code/hbase-0.20.1/lib/commons-logging-1.0.4.jar:/code/hbase-0.20.1/lib/commons-logging-api-1.0.4.jar:/code/hbase-0.20.1/lib/commons-math-1.1.jar:/code/hbase-0.20.1/lib/hadoop-0.20.1-hdfs127-core.jar:/code/hbase-0.20.1/lib/hadoop-0.20.1-test.jar:/code/hbase-0.20.1/lib/jasper-compiler-5.5.12.jar:/code/hbase-0.20.1/lib/jasper-runtime-5.5.12.jar:/code/hbase-0.20.1/lib/jetty-6.1.14.jar:/code/hbase-0.20.1/lib/jetty-util-6.1.14.jar:/code/hbase-0.20.1/lib/jruby-complete-1.2.0.jar:/code/hbase-0.20.1/lib/json.jar:/code/hbase-0.20.1/lib/junit-3.8.1.jar:/code/hbase-0.20.1/lib/libthrift-r771587.jar:/code/hbase-0.20.1/lib/log4j-1.2.15.jar:/code/hbase-0.20.1/lib/lucene-core-2.2.0.jar:/code/hbase-0.20.1/lib/servlet-api-2.5-6.1.14.jar:/code/hbase-0.20.1/lib/xmlenc-0.52.jar:/code/hbase-0.20.1/lib/zookeeper-3.2.1.jar:/code/hbase-0.20.1/lib/jsp-2.1/jsp-2.1.jar:/code/hbase-0.20.1/lib/jsp-2.1/jsp-api-2.1.jar:/code/hadoop-0.20.1/conf
org.apache.hadoop.hbase.regionserver.HRegionServer start

Please feel free to comment.

Best,
zhenyu


On Tue, Dec 1, 2009 at 12:22 PM, Patrick Hunt <[email protected]> wrote:

> Interesting, remind me, what is your current status of:
>
> java vm version?
>
> options you are providing to the JVM on startup (-XX -Xmx and the like - if
> you could provide the exact command line you use to start the jvm that would
> be nice to see)
>
> FYI: I've seen issues with use of incremental gc prior to 1.6.0_17 (in
> particular jvm crashes), I haven't seen any issues with this latest version
> (yet). This was on 64bit linux.
>
> Patrick
>
>
> Zhenyu Zhong wrote:
>
>> So far I have been using gchisto to view the gc-log.
>> In my last RS disconnection, I saw a total GC about 457 seconds. But
>> individually, the max is 1340 ms, min is 0.527ms, avg is 48ms.
>>
>> The RS disconnection might be due to other reasons. I think J-D has been
>> digging that.
>>
>> thanks
>> zhenyu
>>
>>
>>
>> On Mon, Nov 30, 2009 at 9:22 PM, stack <[email protected]> wrote:
>>
>>  I suppose up to this I thought it a given for any java application that
>>> wants to do realtime whether a webserver or search application but yeah,
>>> we
>>> should do more to highlight the import of GC tuning especially when
>>> failure
>>> to do so can be relatively catastrophic (A RegionServer self-shutting
>>> itself
>>> down).  Ryan in particular has been doing a bunch of talking up of the
>>> topic
>>> (He did our performance tuning wiki page too).   We could start up a list
>>> of
>>> use cases and the tunings that helped alleviate GC woes for a particular
>>> cluster profile and loading (So we'd have something to present at BAHUG?
>>>  Do
>>> you know who we might talk to regards pauses in the MR/HDFS team Patrick?
>>> We were introduced to the NameNode Tuner once... we should talk to him
>>> again).  It does seem to be a problem where one tuning does not suit all
>>> deploys.
>>>
>>> Regards Zhenyu's case, there is still work to do IMO.  What I saw in his
>>> logs was a failed promotion from parnew, something that could be helped
>>> starting CMS collection earlier (among other things).  Hes also still on
>>> an
>>> older version of the JVM.   While things are not timing out at the
>>> moment,
>>> IMO its still 'broke' if it has such long pauses (Zhenyu, in your GC
>>> logs,
>>> are you seeing 4 minutes pause?).  Ryan would argue these are inevitable
>>> with CMS -- but at least in the one case that I saw some twiddling would
>>> seem to help.
>>>
>>> Thanks Patrick,
>>> St.Ack
>>>
>>>
>>

Re: regionserver disconnection

Reply via email to