Re: [cas-user] Cas heap filling up quickly

Arnaud Nemrod Fri, 16 Nov 2018 14:30:02 -0800

Thank you Bryan for sharing this !

I'll try a version using tomcat with your configuration.


Arnaud

Le ven. 16 nov. 2018 à 23:19, Bryan Wooten <[email protected]> a écrit :

> We also use hazelcast across 4 Cas nodes, all active (behind a Citrix
> Netscaler with sticky sessions). We do about 400k logins per day. (30k
> students and 20k staff).
>
> Duo enabled for all employees. We don't use any Proxy Tickets at this time.
>
> I have 600+ servers in the JSON Service Registry, all wild carded after
> the DNS name, so probably well over 1000 applications.
>
> We are on CAS 5.2.x running on Tomcat. This ps -ef shows our start up
> settings. We never re-boot. We use the default 8 hour TGT timeout.
>
> /opt/java/java/bin/java
> -Djava.util.logging.config.file=/opt/tomcat/tomcat/conf/logging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms4096m
> -Xmx4096m -Xloggc:/opt/tomcat/tomcat/logs/gc.log -XX:+PrintHeapAtGC
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:-HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/var/tmp/tomcat-7 -XX:+DisableExplicitGC
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:MaxGCPauseMillis=500
> -Djava.endorsed.dirs=/opt/tomcat/tomcat/endorsed -classpath
> /opt/tomcat/tomcat/bin/bootstrap.jar:/opt/tomcat/tomcat/bin/tomcat-juli.jar
> -Dcatalina.base=/opt/tomcat/tomcat -Dcatalina.home=/opt/tomcat/tomcat
> -Djava.io.tmpdir=/opt/tomcat/tomcat/temp
> org.apache.catalina.startup.Bootstrap start
>
> GC with Java is a black art. This just works for us.
>
> Hope this helps.
>
> -Bryan
>
> University of Utah
>
> On Fri, Nov 16, 2018 at 2:49 PM Nono <[email protected]> wrote:
>
>> Hello everyone,
>>
>> We successfully deployed on production a cas v5.2.3 a couple of days ago.
>>
>> Our configuration is : two active/passive cas nodes with a in memory
>> (save JVM as cas) hazelcast cluster that replicates the tickets.
>>
>> Everything worked fine for the first two hours, but when the connections
>> ramped up, the active node froze. We realized that the heap (2g max) was
>> full, so we stopped both nodes to bump up the xmx to 6g on each nodes.
>>
>> After that cas worked perfectly.
>> When monitoring the heap through the day, we noticed a very steep curve
>> going from 1g around 9am to a max around 11am at 5.5g. Then the curve
>> flattened and stayed around 5.5 until 8pm. After that the heap when down to
>> around 4g
>>
>> During the 11am - 8pm period, several things happened :
>>
>> - master GC time increased up to 3s degrading the reponse time of the
>> applications that use cas. We suspect this is related to cache eviction,
>> the frequency was around one major GC every 30 min.
>>
>> - some users where disconnected without notice during the afternoon (or
>> had issues granting PTs), obviously a consequence of the cache hitting its
>> max allowed size and aggressively evicting tickets.
>>
>> We suspected an eviction problem with hazelcast, so we did a heap dump
>> and we installed hazecast management center.
>>
>> Our first observations were :
>>
>> - we had a backup count set at 1 which doubled the size of the cluster.
>> - we had a huge amount of PGT : around 200000 for 3000 TGT
>> - PGT are quite big >10k (dixit hazelcast mancenter)
>>
>> So for the next day we disabled the hazelcast backup.
>>
>> Now our heap usage is a little better.
>> The heap start around 1g at 9am to plateau at 5.5g around 12. From 12 to
>> 4pm the curve stay flat around 5.5g with only minor GC. Around 4pm major gc
>> occurs every 30 min until 6pm, the the heap goes down.
>>
>> Our tickets are supposed to expire after 6h. So, the way I read this is :
>> people start working around 9am,they produce a lot of tickets between 9 and
>> 12, hence the steep curve. Between 12 and 14 the activity slows downs and
>> ticket production stops while the tickets created around 8am start to be
>> evicted slowly. After 14 activity starts again and tickets are created.
>> Around 4pm the cache is full and massively evicts the tickets created in
>> the morning hence the major GCs
>>
>> No users complained about being disconnected, but the heap stay close to
>> its max a good part of the day,and we still have around 200000 pgts for
>> 3000 TGT. And we have around 350 thread runing all day.
>>
>> Our configuration is :
>> Xmx 6g
>> Eviction policy : default with TTL 6h ttk 6h for tgt (and PGT)
>> LFU
>> Hazelcast max heap size 70
>> GC g1c java 8
>> Cas War overlay with undertow
>> A dozen webapps using 60+ webservices all protected by cas
>>
>>
>> For now it works but we have to restart the nodes every nights to clean
>> the heap.
>> I don't like the idea of the heap being 90% full all the day, if the
>> number of connections increases we might have unwanted disconnections
>> again. And the thread number is a concern as well. And I would like to do
>> something about these issues.
>>
>> My questions :
>>
>> - are these numbers normal ?
>>   - 200000 pgts for 3000 tgt
>>   - 3g of pgts ?
>>   - 350 thread all day ?
>>   - 90% of the heap full all day ?
>>   - is our eviction policy correct ?
>>
>> I can't decide if we have a memory leak or if it's a normal situation
>> considering our 3000 users and our 70+ applications linked by cas.
>> We would feel more comfortable is the heap wasn't at 90% all day.
>>
>> We have several options now :
>>
>> - try lru instead of lfu
>> - reduce the tgt TTL to 4h
>> - use a different evicition policy like a timeout on the tickets
>> - bump up the xmx Hoping we would hit the sweet spot between memory
>> consumption and cache eviction but taking the risk of lengthy major Gc
>> - put the hazelcast clusters in their own JVM
>> - do nothing because everything is normal ...
>>
>>
>> I know it's a long text so thank you for reading everything ! Any advice
>> will be appreciated !
>>
>> --
>> - Website: https://apereo.github.io/cas
>> - Gitter Chatroom: https://gitter.im/apereo/cas
>> - List Guidelines: https://goo.gl/1VRrw7
>> - Contributions: https://goo.gl/mh7qDG
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "CAS Community" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/a/apereo.org/d/msgid/cas-user/ec5d098d-d5f9-4ec3-99b0-0f773ca966b3%40apereo.org
>> .
>>
> --
> - Website: https://apereo.github.io/cas
> - Gitter Chatroom: https://gitter.im/apereo/cas
> - List Guidelines: https://goo.gl/1VRrw7
> - Contributions: https://goo.gl/mh7qDG
> ---
> You received this message because you are subscribed to the Google Groups
> "CAS Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/a/apereo.org/d/msgid/cas-user/CAG9x2GWC0e4T5VbCkUqJf4Ny2BFTTQ0mV1%3DszFcZKGkkO%2B0E4w%40mail.gmail.com
> <https://groups.google.com/a/apereo.org/d/msgid/cas-user/CAG9x2GWC0e4T5VbCkUqJf4Ny2BFTTQ0mV1%3DszFcZKGkkO%2B0E4w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
--- 
You received this message because you are subscribed to the Google Groups "CAS 
Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-user/CAE8GWhihyH4pKMh2Cw%2BfZU-bGvdjwuHqXQ8LFX-kM%3DHY3Fc4fA%40mail.gmail.com.

Re: [cas-user] Cas heap filling up quickly

Reply via email to