The java vm version I am running on is still 1.6.0_11. We are scheduling an upgrade soon.
The command we use to start the RegionServer is: /usr/local/jdk1.6.0_11/bin/java -Xmx4096m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -XX:+PrintTenuringDistribution -XX:MaxTenuringThreshold=6 -XX:SurvivorRatio=6 -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/data/hbase_logs/gc-hbase.log -XX:ErrorFile=/data/hbase_logs/java_err.log -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.password.file=/code/hbase-0.20.1/conf/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/code/hbase-0.20.1/conf/jmxremote.access -Dcom.sun.management.jmxremote.port=10102 -Dhbase.log.dir=/data/hbase_logs -Dhbase.log.file=hbase-hadoop-admin-regionserver-superpyxis0006.log -Dhbase.home.dir=/code/hbase-0.20.1 -Dhbase.id.str=hadoop-admin -Dhbase.root.logger=INFO,DRFA -Djava.library.path=/code/hbase-0.20.1/lib/native/Linux-amd64-64 -classpath /code/hbase-0.20.1/bin/../conf:/usr/local/jdk1.6.0_11/lib/tools.jar:/code/hbase-0.20.1:/code/hbase-0.20.1/hbase-0.20.1.jar:/code/hbase-0.20.1/hbase-0.20.1-test.jar:/code/hbase-0.20.1/lib/AgileJSON-2009-03-30.jar:/code/hbase-0.20.1/lib/commons-cli-2.0-SNAPSHOT.jar:/code/hbase-0.20.1/lib/commons-el-from-jetty-5.1.4.jar:/code/hbase-0.20.1/lib/commons-httpclient-3.0.1.jar:/code/hbase-0.20.1/lib/commons-logging-1.0.4.jar:/code/hbase-0.20.1/lib/commons-logging-api-1.0.4.jar:/code/hbase-0.20.1/lib/commons-math-1.1.jar:/code/hbase-0.20.1/lib/hadoop-0.20.1-hdfs127-core.jar:/code/hbase-0.20.1/lib/hadoop-0.20.1-test.jar:/code/hbase-0.20.1/lib/jasper-compiler-5.5.12.jar:/code/hbase-0.20.1/lib/jasper-runtime-5.5.12.jar:/code/hbase-0.20.1/lib/jetty-6.1.14.jar:/code/hbase-0.20.1/lib/jetty-util-6.1.14.jar:/code/hbase-0.20.1/lib/jruby-complete-1.2.0.jar:/code/hbase-0.20.1/lib/json.jar:/code/hbase-0.20.1/lib/junit-3.8.1.jar:/code/hbase-0.20.1/lib/libthrift-r771587.jar:/code/hbase-0.20.1/lib/log4j-1.2.15.jar:/code/hbase-0.20.1/lib/lucene-core-2.2.0.jar:/code/hbase-0.20.1/lib/servlet-api-2.5-6.1.14.jar:/code/hbase-0.20.1/lib/xmlenc-0.52.jar:/code/hbase-0.20.1/lib/zookeeper-3.2.1.jar:/code/hbase-0.20.1/lib/jsp-2.1/jsp-2.1.jar:/code/hbase-0.20.1/lib/jsp-2.1/jsp-api-2.1.jar:/code/hadoop-0.20.1/conf org.apache.hadoop.hbase.regionserver.HRegionServer start Please feel free to comment. Best, zhenyu On Tue, Dec 1, 2009 at 12:22 PM, Patrick Hunt <[email protected]> wrote: > Interesting, remind me, what is your current status of: > > java vm version? > > options you are providing to the JVM on startup (-XX -Xmx and the like - if > you could provide the exact command line you use to start the jvm that would > be nice to see) > > FYI: I've seen issues with use of incremental gc prior to 1.6.0_17 (in > particular jvm crashes), I haven't seen any issues with this latest version > (yet). This was on 64bit linux. > > Patrick > > > Zhenyu Zhong wrote: > >> So far I have been using gchisto to view the gc-log. >> In my last RS disconnection, I saw a total GC about 457 seconds. But >> individually, the max is 1340 ms, min is 0.527ms, avg is 48ms. >> >> The RS disconnection might be due to other reasons. I think J-D has been >> digging that. >> >> thanks >> zhenyu >> >> >> >> On Mon, Nov 30, 2009 at 9:22 PM, stack <[email protected]> wrote: >> >> I suppose up to this I thought it a given for any java application that >>> wants to do realtime whether a webserver or search application but yeah, >>> we >>> should do more to highlight the import of GC tuning especially when >>> failure >>> to do so can be relatively catastrophic (A RegionServer self-shutting >>> itself >>> down). Ryan in particular has been doing a bunch of talking up of the >>> topic >>> (He did our performance tuning wiki page too). We could start up a list >>> of >>> use cases and the tunings that helped alleviate GC woes for a particular >>> cluster profile and loading (So we'd have something to present at BAHUG? >>> Do >>> you know who we might talk to regards pauses in the MR/HDFS team Patrick? >>> We were introduced to the NameNode Tuner once... we should talk to him >>> again). It does seem to be a problem where one tuning does not suit all >>> deploys. >>> >>> Regards Zhenyu's case, there is still work to do IMO. What I saw in his >>> logs was a failed promotion from parnew, something that could be helped >>> starting CMS collection earlier (among other things). Hes also still on >>> an >>> older version of the JVM. While things are not timing out at the >>> moment, >>> IMO its still 'broke' if it has such long pauses (Zhenyu, in your GC >>> logs, >>> are you seeing 4 minutes pause?). Ryan would argue these are inevitable >>> with CMS -- but at least in the one case that I saw some twiddling would >>> seem to help. >>> >>> Thanks Patrick, >>> St.Ack >>> >>> >>
