Thanks Chris! This is really helpful.
Bryan Wooten Tel: (801)585-9323 Email: [email protected]<mailto:[email protected]> [Identity & Access Management_combined centered] From: Christopher Myers [mailto:[email protected]] Sent: Monday, August 31, 2015 2:35 PM To: [email protected]; Bryan Wooten Subject: Re: [cas-user] Hazelcast / Slow CAS In the past when I've run into things like this, I've started a VNC session on the server and let jvisualvm watch the tomcat process so that it could give me statistics on gc activity. For memory tuning, I spent roughly two months slowly tweaking the config for our (very active) cluster nodes (which also host our webmail and campus portal,) and came up with: -Xms6g -Xmx6g -Xss512k -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -XX:+UseCompressedOops -XX:MaxPermSize=256m -XX:NewRatio=3 -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68 This routinely ends up with regular minor collections, and very few major collections even after an extended period of high use. For additional monitoring, we also have a home-built diagnostics page (attached) that we run on all of our cluster nodes, polled by our load balancer. It polls things like used db threads, server connections, ldap connections, heap size, gc activity, etc. : 15:16pm up 19 days 5:55, 0 users, load average: 0.19, 0.31, 0.31 Connection to PROD ok : connections in use/idle/max: 18/4/25 Connection to Moodle ok : connections in use/idle/max: 1/1/25 Connection to Jira ok : connections in use/idle/max: 1/0/25 Connection to Diebold ok : connections in use/idle/max: 1/1/25 Connection to LDAP on mulinedir1 is ok. Connection to LDAPS on mulinedir1 is ok. Connection to LDAP on mulinedir2 is ok. Connection to LDAPS on mulinedir2 is ok. Java Heap in use/max: 2140M/5990M Java non-Heap in use/max: 112M/304M Number of Java threads: 177 Peak Java threads: 226 Garbage Collection: Copy: 5969 Garbage Collection: ConcurrentMarkSweep: 11 Waiting for I/O accept: org.apache.catalina.core.StandardServer active internet connections (w/o servers) proto recv-q send-q local address foreign address state tcp 9630 0 muwacnode1.millik:60700 muoradbprod.milli:6010 established tcp 10200 0 muwacnode1.millik:54433 muoradbprod.milli:6010 established tcp 10200 0 muwacnode1.millik:54433 muoradbprod.milli:6010 established tcp 0 0 muwacnode1.millik:44428 muoradbprod.milli:6010 established tcp 0 0 localhost:8009 localhost:40585 established <snip/> --------------------------- ESTABLISHED: 104 TIME_WAIT: 42 CLOSE_WAIT: 2 LDAP: 45 LDAPS: 0 HTTP: 0 HTTPS: 3 eDir1 Est: 7 eDir2 Est: 10 Filesystem Size Used Avail Use% Mounted on /dev/sda4 74G 17G 58G 23% / udev 4.0G 96K 4.0G 1% /dev tmpfs 4.0G 0 4.0G 0% /dev/shm /dev/sda1 92M 21M 66M 25% /boot /dev/sda3 4.0G 1.6G 2.5G 40% /var 172.16.Y.X:/srv/www/htdocs 26G 8.7G 16G 36% /srv/www/htdocs 172.16.Y.X:/var/export 4.0G 2.0G 1.9G 51% /var/export 172.16.Y.X:/srv/deploy 26G 8.7G 16G 36% /srv/deploy 172.16.Y.X:/mnt/data 26G 8.7G 16G 36% /data //muoesfile2/data 3.2T 2.6T 610G 81% /mnt/oesfile2 myMILLIKIN project is deployed. Finally, we run JavaMelody on our cluster nodes as well, which gives some really good statistics (note that these stats also include our campus portal and webmail, not just CAS, but you get the idea): >>> Bryan Wooten <[email protected]<mailto:[email protected]>> 08/31/15 >>> 2:58 PM >>> Hi all, So twice in the past few months CAS (3.5.2) has gotten really slow. A restart of the Tomcat servers makes the issue go away. There are no errors in either cas.log or catalina.out, it is just really slow. Because the issue occurs only in production and not in test I have never had time to attempt any kind of root cause analysis. Now our hazelcast is configured to use 85% of heap which is set to 2048meg. We get about 200k logins a day. I think I may be running into a tomcat/jvm tuning issue (heap size or garbage collection issue). Does anyone have suggestions on how I should monitor this or what config settings for tomcat I should be using/ Thanks, Bryan Wooten Tel: (801)585-9323 Email: [email protected]<mailto:[email protected]> [Identity & Access Management_combined centered] -- You are currently subscribed to [email protected]<mailto:[email protected]> as: [email protected]<mailto:[email protected]> To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user
