Thank you Bryan for sharing this ! I'll try a version using tomcat with your configuration.
Arnaud Le ven. 16 nov. 2018 à 23:19, Bryan Wooten <[email protected]> a écrit : > We also use hazelcast across 4 Cas nodes, all active (behind a Citrix > Netscaler with sticky sessions). We do about 400k logins per day. (30k > students and 20k staff). > > Duo enabled for all employees. We don't use any Proxy Tickets at this time. > > I have 600+ servers in the JSON Service Registry, all wild carded after > the DNS name, so probably well over 1000 applications. > > We are on CAS 5.2.x running on Tomcat. This ps -ef shows our start up > settings. We never re-boot. We use the default 8 hour TGT timeout. > > /opt/java/java/bin/java > -Djava.util.logging.config.file=/opt/tomcat/tomcat/conf/logging.properties > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms4096m > -Xmx4096m -Xloggc:/opt/tomcat/tomcat/logs/gc.log -XX:+PrintHeapAtGC > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/var/tmp/tomcat-7 -XX:+DisableExplicitGC > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:MaxGCPauseMillis=500 > -Djava.endorsed.dirs=/opt/tomcat/tomcat/endorsed -classpath > /opt/tomcat/tomcat/bin/bootstrap.jar:/opt/tomcat/tomcat/bin/tomcat-juli.jar > -Dcatalina.base=/opt/tomcat/tomcat -Dcatalina.home=/opt/tomcat/tomcat > -Djava.io.tmpdir=/opt/tomcat/tomcat/temp > org.apache.catalina.startup.Bootstrap start > > GC with Java is a black art. This just works for us. > > Hope this helps. > > -Bryan > > University of Utah > > On Fri, Nov 16, 2018 at 2:49 PM Nono <[email protected]> wrote: > >> Hello everyone, >> >> We successfully deployed on production a cas v5.2.3 a couple of days ago. >> >> Our configuration is : two active/passive cas nodes with a in memory >> (save JVM as cas) hazelcast cluster that replicates the tickets. >> >> Everything worked fine for the first two hours, but when the connections >> ramped up, the active node froze. We realized that the heap (2g max) was >> full, so we stopped both nodes to bump up the xmx to 6g on each nodes. >> >> After that cas worked perfectly. >> When monitoring the heap through the day, we noticed a very steep curve >> going from 1g around 9am to a max around 11am at 5.5g. Then the curve >> flattened and stayed around 5.5 until 8pm. After that the heap when down to >> around 4g >> >> During the 11am - 8pm period, several things happened : >> >> - master GC time increased up to 3s degrading the reponse time of the >> applications that use cas. We suspect this is related to cache eviction, >> the frequency was around one major GC every 30 min. >> >> - some users where disconnected without notice during the afternoon (or >> had issues granting PTs), obviously a consequence of the cache hitting its >> max allowed size and aggressively evicting tickets. >> >> We suspected an eviction problem with hazelcast, so we did a heap dump >> and we installed hazecast management center. >> >> Our first observations were : >> >> - we had a backup count set at 1 which doubled the size of the cluster. >> - we had a huge amount of PGT : around 200000 for 3000 TGT >> - PGT are quite big >10k (dixit hazelcast mancenter) >> >> So for the next day we disabled the hazelcast backup. >> >> Now our heap usage is a little better. >> The heap start around 1g at 9am to plateau at 5.5g around 12. From 12 to >> 4pm the curve stay flat around 5.5g with only minor GC. Around 4pm major gc >> occurs every 30 min until 6pm, the the heap goes down. >> >> Our tickets are supposed to expire after 6h. So, the way I read this is : >> people start working around 9am,they produce a lot of tickets between 9 and >> 12, hence the steep curve. Between 12 and 14 the activity slows downs and >> ticket production stops while the tickets created around 8am start to be >> evicted slowly. After 14 activity starts again and tickets are created. >> Around 4pm the cache is full and massively evicts the tickets created in >> the morning hence the major GCs >> >> No users complained about being disconnected, but the heap stay close to >> its max a good part of the day,and we still have around 200000 pgts for >> 3000 TGT. And we have around 350 thread runing all day. >> >> Our configuration is : >> Xmx 6g >> Eviction policy : default with TTL 6h ttk 6h for tgt (and PGT) >> LFU >> Hazelcast max heap size 70 >> GC g1c java 8 >> Cas War overlay with undertow >> A dozen webapps using 60+ webservices all protected by cas >> >> >> For now it works but we have to restart the nodes every nights to clean >> the heap. >> I don't like the idea of the heap being 90% full all the day, if the >> number of connections increases we might have unwanted disconnections >> again. And the thread number is a concern as well. And I would like to do >> something about these issues. >> >> My questions : >> >> - are these numbers normal ? >> - 200000 pgts for 3000 tgt >> - 3g of pgts ? >> - 350 thread all day ? >> - 90% of the heap full all day ? >> - is our eviction policy correct ? >> >> I can't decide if we have a memory leak or if it's a normal situation >> considering our 3000 users and our 70+ applications linked by cas. >> We would feel more comfortable is the heap wasn't at 90% all day. >> >> We have several options now : >> >> - try lru instead of lfu >> - reduce the tgt TTL to 4h >> - use a different evicition policy like a timeout on the tickets >> - bump up the xmx Hoping we would hit the sweet spot between memory >> consumption and cache eviction but taking the risk of lengthy major Gc >> - put the hazelcast clusters in their own JVM >> - do nothing because everything is normal ... >> >> >> I know it's a long text so thank you for reading everything ! Any advice >> will be appreciated ! >> >> -- >> - Website: https://apereo.github.io/cas >> - Gitter Chatroom: https://gitter.im/apereo/cas >> - List Guidelines: https://goo.gl/1VRrw7 >> - Contributions: https://goo.gl/mh7qDG >> --- >> You received this message because you are subscribed to the Google Groups >> "CAS Community" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/a/apereo.org/d/msgid/cas-user/ec5d098d-d5f9-4ec3-99b0-0f773ca966b3%40apereo.org >> . >> > -- > - Website: https://apereo.github.io/cas > - Gitter Chatroom: https://gitter.im/apereo/cas > - List Guidelines: https://goo.gl/1VRrw7 > - Contributions: https://goo.gl/mh7qDG > --- > You received this message because you are subscribed to the Google Groups > "CAS Community" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/a/apereo.org/d/msgid/cas-user/CAG9x2GWC0e4T5VbCkUqJf4Ny2BFTTQ0mV1%3DszFcZKGkkO%2B0E4w%40mail.gmail.com > <https://groups.google.com/a/apereo.org/d/msgid/cas-user/CAG9x2GWC0e4T5VbCkUqJf4Ny2BFTTQ0mV1%3DszFcZKGkkO%2B0E4w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- - Website: https://apereo.github.io/cas - Gitter Chatroom: https://gitter.im/apereo/cas - List Guidelines: https://goo.gl/1VRrw7 - Contributions: https://goo.gl/mh7qDG --- You received this message because you are subscribed to the Google Groups "CAS Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/CAE8GWhihyH4pKMh2Cw%2BfZU-bGvdjwuHqXQ8LFX-kM%3DHY3Fc4fA%40mail.gmail.com.
