We also use hazelcast across 4 Cas nodes, all active (behind a Citrix
Netscaler with sticky sessions). We do about 400k logins per day. (30k
students and 20k staff).

Duo enabled for all employees. We don't use any Proxy Tickets at this time.

I have 600+ servers in the JSON Service Registry, all wild carded after the
DNS name, so probably well over 1000 applications.

We are on CAS 5.2.x running on Tomcat. This ps -ef shows our start up
settings. We never re-boot. We use the default 8 hour TGT timeout.

/opt/java/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat/tomcat/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms4096m
-Xmx4096m -Xloggc:/opt/tomcat/tomcat/logs/gc.log -XX:+PrintHeapAtGC
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/tmp/tomcat-7 -XX:+DisableExplicitGC
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:MaxGCPauseMillis=500
-Djava.endorsed.dirs=/opt/tomcat/tomcat/endorsed -classpath
/opt/tomcat/tomcat/bin/bootstrap.jar:/opt/tomcat/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat/tomcat -Dcatalina.home=/opt/tomcat/tomcat
-Djava.io.tmpdir=/opt/tomcat/tomcat/temp
org.apache.catalina.startup.Bootstrap start

GC with Java is a black art. This just works for us.

Hope this helps.

-Bryan

University of Utah

On Fri, Nov 16, 2018 at 2:49 PM Nono <arnaud.nem...@gmail.com> wrote:

> Hello everyone,
>
> We successfully deployed on production a cas v5.2.3 a couple of days ago.
>
> Our configuration is : two active/passive cas nodes with a in memory (save
> JVM as cas) hazelcast cluster that replicates the tickets.
>
> Everything worked fine for the first two hours, but when the connections
> ramped up, the active node froze. We realized that the heap (2g max) was
> full, so we stopped both nodes to bump up the xmx to 6g on each nodes.
>
> After that cas worked perfectly.
> When monitoring the heap through the day, we noticed a very steep curve
> going from 1g around 9am to a max around 11am at 5.5g. Then the curve
> flattened and stayed around 5.5 until 8pm. After that the heap when down to
> around 4g
>
> During the 11am - 8pm period, several things happened :
>
> - master GC time increased up to 3s degrading the reponse time of the
> applications that use cas. We suspect this is related to cache eviction,
> the frequency was around one major GC every 30 min.
>
> - some users where disconnected without notice during the afternoon (or
> had issues granting PTs), obviously a consequence of the cache hitting its
> max allowed size and aggressively evicting tickets.
>
> We suspected an eviction problem with hazelcast, so we did a heap dump and
> we installed hazecast management center.
>
> Our first observations were :
>
> - we had a backup count set at 1 which doubled the size of the cluster.
> - we had a huge amount of PGT : around 200000 for 3000 TGT
> - PGT are quite big >10k (dixit hazelcast mancenter)
>
> So for the next day we disabled the hazelcast backup.
>
> Now our heap usage is a little better.
> The heap start around 1g at 9am to plateau at 5.5g around 12. From 12 to
> 4pm the curve stay flat around 5.5g with only minor GC. Around 4pm major gc
> occurs every 30 min until 6pm, the the heap goes down.
>
> Our tickets are supposed to expire after 6h. So, the way I read this is :
> people start working around 9am,they produce a lot of tickets between 9 and
> 12, hence the steep curve. Between 12 and 14 the activity slows downs and
> ticket production stops while the tickets created around 8am start to be
> evicted slowly. After 14 activity starts again and tickets are created.
> Around 4pm the cache is full and massively evicts the tickets created in
> the morning hence the major GCs
>
> No users complained about being disconnected, but the heap stay close to
> its max a good part of the day,and we still have around 200000 pgts for
> 3000 TGT. And we have around 350 thread runing all day.
>
> Our configuration is :
> Xmx 6g
> Eviction policy : default with TTL 6h ttk 6h for tgt (and PGT)
> LFU
> Hazelcast max heap size 70
> GC g1c java 8
> Cas War overlay with undertow
> A dozen webapps using 60+ webservices all protected by cas
>
>
> For now it works but we have to restart the nodes every nights to clean
> the heap.
> I don't like the idea of the heap being 90% full all the day, if the
> number of connections increases we might have unwanted disconnections
> again. And the thread number is a concern as well. And I would like to do
> something about these issues.
>
> My questions :
>
> - are these numbers normal ?
>   - 200000 pgts for 3000 tgt
>   - 3g of pgts ?
>   - 350 thread all day ?
>   - 90% of the heap full all day ?
>   - is our eviction policy correct ?
>
> I can't decide if we have a memory leak or if it's a normal situation
> considering our 3000 users and our 70+ applications linked by cas.
> We would feel more comfortable is the heap wasn't at 90% all day.
>
> We have several options now :
>
> - try lru instead of lfu
> - reduce the tgt TTL to 4h
> - use a different evicition policy like a timeout on the tickets
> - bump up the xmx Hoping we would hit the sweet spot between memory
> consumption and cache eviction but taking the risk of lengthy major Gc
> - put the hazelcast clusters in their own JVM
> - do nothing because everything is normal ...
>
>
> I know it's a long text so thank you for reading everything ! Any advice
> will be appreciated !
>
> --
> - Website: https://apereo.github.io/cas
> - Gitter Chatroom: https://gitter.im/apereo/cas
> - List Guidelines: https://goo.gl/1VRrw7
> - Contributions: https://goo.gl/mh7qDG
> ---
> You received this message because you are subscribed to the Google Groups
> "CAS Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cas-user+unsubscr...@apereo.org.
> To view this discussion on the web visit
> https://groups.google.com/a/apereo.org/d/msgid/cas-user/ec5d098d-d5f9-4ec3-99b0-0f773ca966b3%40apereo.org
> .
>

-- 
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
--- 
You received this message because you are subscribed to the Google Groups "CAS 
Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to cas-user+unsubscr...@apereo.org.
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-user/CAG9x2GWC0e4T5VbCkUqJf4Ny2BFTTQ0mV1%3DszFcZKGkkO%2B0E4w%40mail.gmail.com.

Reply via email to