Related, I just got the latest shib-cas-authn plugin working on Shibboleth IDP 
4.1.2 so we can delegate authentication to CAS (6.3.5).  When I do this and try 
to authenticate, I see the following log message (45 times) and the response 
time from CAS is so long that our IDP timesout the seeion. 

 

^XJul  8 10:12:53 105W user Loading  SAML metadata from 
[/etc/cas/saml/metadata/sp_metadata.xml]

Jul  8 10:12:53 105W user No  metadata signature location is defined for 
[/etc/cas/saml/metadata/sp_metadata.xml], so SignatureValidationFilter will not 
be invoked

Jul  8 10:12:53 105W user Initialized  metadata resolver from 
[/etc/cas/saml/metadata/sp_metadata.xml]

Jul  8 10:12:53 105W user SAML  metadata resolver 
[org.opensaml.saml.metadata.resolver.ChainingMetadataResolver] obtained from 
the cache is unable to produce/resolve valid metadata 
[/etc/cas/saml/metadata/sp_metadata.xml]. Metadata resolver cache entry with 
key [f2317……] has been invalidated. Retry attempt: [1]

 

Anyone ever seen this before?

 

Thanks, Jay 

________________________________

Jason Rappaport (he/him)

Identity and Access Management Analyst

Office of Information Technology

Email:   <mailto:[email protected]> [email protected] 

Office:  609-258-8464

 

 

From: [email protected] <[email protected]> On Behalf Of Jason B. Rappaport
Sent: Wednesday, July 7, 2021 2:24 PM
To: [email protected]
Subject: [cas-user] RE: CAS 6.3 High CPU on Tomcat

 

Good afternoon.  Just to add a bit more information to this.  

 

Today we doubled the CPU and RAM for on our on-prem CAS servers; they are now 
at 4 CPUS and 16 GB of RAM.  They are now stable (timeouts are gone for various 
services and the metrics endpoints are responding).

Our off-prem CAS servers are running fine with 2 CPUS and 8 GB of RAM; no 
change was made to them.  

 

Juan – you mentioned Hazelcast, we use that as well for replicating information 
from our on-prem CAS servers to the off-prem CAS servers.  We have also 
encountered several instances where our off-prem CAS servers CPU is pegged in 
both our QA and PROD environments.  We have little to no traffic using our QA 
CAS servers and what is interesting is that both environments (QA and PROD) 
have pegged CPUs on the same days.  When we investigated, we found that the 
Hazelcast cluster was constantly being reestablished.  I posted a message on 
the Hazelcast support community 
https://groups.google.com/g/hazelcast/c/UmB1VzOBm-4 and then talked to the 
folks at Hazelcast.  Basically what they said was that unless your virtual 
machine CAS servers are in the same datacenter, do not use the Hazelcast 
version that comes with CAS.  The Hazelcast folks indicated that using TCP/IP 
to maintain session information on CAS severs that are too distant (like having 
CAS servers off-prem and some on-prem) would likely cause issues.  They 
recommended purchasing their Hazelcast enterprise edition (which is really 
interesting and has a ton of cool features, but is also very expensive) that 
uses message queuing (MQ) technology instead of relying on TCP/IP to maintain 
session information.  In your logs, look for “ Initialized new cluster 
connection between” we had 20k messages in one day that the CPUs were pegged.  

 

We asked our networking team about the stability between on campus and off 
campus cloud provider and they indicated the connection was stable enough that 
we would not notice any glitches; which doesn’t explain why QA and PROD saw 
pegged CPUs on the same day as those hosts don’t talk to each other.  

 

So for now, doubling the CPU and RAM on our on-prem CAS servers (which only 
handle ½ the authentication traffic as our off-prem CAS servers) seems to keep 
us stable…..for now.  

 

Attached is a screenshot showing our CPU usage.  We upgraded CAS on 7.6 
(yesterday), which can be seen on the chart where the CPU is averaging about 
50%.  Today around 12:00pm, we doubled CPU and RAM, rebooted, and now CPU is at 
1%.  The gap in the data…happens sometimes with our data collection tools.  

 



 

Thanks, Jay 

 

 

________________________________

Jason Rappaport (he/him)

Identity and Access Management Analyst

Office of Information Technology

Email:   <mailto:[email protected]> [email protected] 

Office:  609-258-8464

 

 

From: [email protected] <mailto:[email protected]>  <[email protected] 
<mailto:[email protected]> > On Behalf Of Juan Quintanilla
Sent: Wednesday, July 7, 2021 1:42 PM
To: CAS Community <[email protected] <mailto:[email protected]> >
Subject: [cas-user] CAS 6.3 High CPU on Tomcat

 

Hi,

 

We are running CAS 6.3 with tomcat 9 and Java 11, and have SAML2 and oauth 
dependencies installed with hazelcast as ticket registry and json files for 
service registry.  We have noticed that after a few days of running the CPU 
usage for tomcat spikes to above 100% and requires a restart for it to come 
back down. When we check the load on the server there isn't to many 
authentications happening. We had a similar tomcat configuration when running 
CAS 5.3 with tomcat 8.5 and didn't really see this issue. What we notice in the 
CAS logs is the metadata being loaded on several occasions over 10 times for a 
single authentication. 
[org.apereo.cas.support.saml.services.idp.metadata.SamlRegisteredServiceServiceProviderMetadataFacade]
 - <Resolved metadata chain from [/etc/cas/saml/sp_metadata

 

In our tomcat configuration we are using nio connector together with the 
opensslimplementation, we have also tried switching to nio2 but we see similar 
behavior.  We have even bumped up the CPU on the server but still encounter the 
same problem. Doing various thread dumps the issue doesn't seem to be with GC, 
or hazelcast.  Below is a snippet of one of the threads that was showing up 
with high cpu during a thread dump.

 

"https-openssl-nio-443-exec-63" #340 daemon prio=5 os_prio=0 cpu=783837.57ms 
elapsed=335561.54s tid=0x00007f44bc289800 nid=0x7c5f runnable 
[0x00007f4468f5d000]
java.lang.Thread.State: RUNNABLE
at 
java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames([email protected]/Native
 <mailto:[email protected]/Native>  Method)
at 
java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames([email protected]/StackStreamFactory.java:386
 <mailto:[email protected]/StackStreamFactory.java:386> )
at 
java.lang.StackStreamFactory$AbstractStackWalker.getNextBatch([email protected]/StackStreamFactory.java:322
 <mailto:[email protected]/StackStreamFactory.java:322> )
at 
java.lang.StackStreamFactory$AbstractStackWalker.peekFrame([email protected]/StackStreamFactory.java:263
 <mailto:[email protected]/StackStreamFactory.java:263> )
at 
java.lang.StackStreamFactory$AbstractStackWalker.hasNext([email protected]/StackStreamFactory.java:351)
at 
java.lang.StackStreamFactory$StackFrameTraverser.tryAdvance([email protected]/StackStreamFactory.java:593
 <mailto:[email protected]/StackStreamFactory.java:593> )
at 
java.util.stream.ReferencePipeline.forEachWithCancel([email protected]/ReferencePipeline.java:127
 <mailto:[email protected]/ReferencePipeline.java:127> )
at 
java.util.stream.AbstractPipeline.copyIntoWithCancel([email protected]/AbstractPipeline.java:502
 <mailto:[email protected]/AbstractPipeline.java:502> )
at 
java.util.stream.AbstractPipeline.copyInto([email protected]/AbstractPipeline.java:488
 <mailto:[email protected]/AbstractPipeline.java:488> )
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto([email protected]/AbstractPipeline.java:474
 <mailto:[email protected]/AbstractPipeline.java:474> )
at 
java.util.stream.FindOps$FindOp.evaluateSequential([email protected]/FindOps.java:150
 <mailto:[email protected]/FindOps.java:150> )
at 
java.util.stream.AbstractPipeline.evaluate([email protected]/AbstractPipeline.java:234
 <mailto:[email protected]/AbstractPipeline.java:234> )
at 
java.util.stream.ReferencePipeline.findFirst([email protected]/ReferencePipeline.java:543
 <mailto:[email protected]/ReferencePipeline.java:543> )
at 
org.apache.logging.log4j.util.StackLocator.lambda$getCallerClass$6(StackLocator.java:55)
at 
org.apache.logging.log4j.util.StackLocator$$Lambda$163/0x0000000800376840.apply(Unknown
 Source)
at 
java.lang.StackStreamFactory$StackFrameTraverser.consumeFrames([email protected]/StackStreamFactory.java:534
 <mailto:[email protected]/StackStreamFactory.java:534> )
at 
java.lang.StackStreamFactory$AbstractStackWalker.doStackWalk([email protected]/StackStreamFactory.java:306
 <mailto:[email protected]/StackStreamFactory.java:306> )
at 
java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk([email protected]/Native
 <mailto:[email protected]/Native>  Method)
at 
java.lang.StackStreamFactory$AbstractStackWalker.beginStackWalk([email protected]/StackStreamFactory.java:370
 <mailto:[email protected]/StackStreamFactory.java:370> )
at 
java.lang.StackStreamFactory$AbstractStackWalker.walk([email protected]/StackStreamFactory.java:243
 <mailto:[email protected]/StackStreamFactory.java:243> )
at java.lang.StackWalker.walk([email protected]/StackWalker.java:498 
<mailto:[email protected]/StackWalker.java:498> )
at 
org.apache.logging.log4j.util.StackLocator.getCallerClass(StackLocator.java:54)
at 
org.apache.logging.log4j.util.StackLocatorUtil.getCallerClass(StackLocatorUtil.java:67)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:51)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:48)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:30)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:354)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:379)
at org.opensaml.core.xml.AbstractXMLObject.<init>(AbstractXMLObject.java:48)
at 
org.opensaml.saml.saml2.metadata.impl.EndpointImpl.<init>(EndpointImpl.java:59)
at 
org.opensaml.saml.saml2.metadata.impl.SingleSignOnServiceImpl.<init>(SingleSignOnServiceImpl.java:40)
at 
org.opensaml.saml.saml2.metadata.impl.SingleSignOnServiceBuilder.buildObject(SingleSignOnServiceBuilder.java:49)
at 
org.opensaml.saml.saml2.metadata.impl.SingleSignOnServiceBuilder.buildObject(SingleSignOnServiceBuilder.java:31)
at 
org.opensaml.core.xml.AbstractXMLObjectBuilder.buildObject(AbstractXMLObjectBuilder.java:58)
at 
org.opensaml.core.xml.AbstractXMLObjectBuilder.buildObject(AbstractXMLObjectBuilder.java:73)
at 
org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.buildXMLObject(AbstractXMLObjectUnmarshaller.java:182)
at 
org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshall(AbstractXMLObjectUnmarshaller.java:104)
at 
org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshallChildElement(AbstractXMLObjectUnmarshaller.java:337)
at 
org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshall(AbstractXMLObjectUnmarshaller.java:128)
at 
org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshallChildElement(AbstractXMLObjectUnmarshaller.java:337)
at 
org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshall(AbstractXMLObjectUnmarshaller.java:128)
at 
org.opensaml.saml.metadata.resolver.impl.DOMMetadataResolver.initMetadataResolver(DOMMetadataResolver.java:68)
at 
org.apereo.cas.support.saml.idp.metadata.locator.SamlIdPMetadataResolver.initMetadataResolver(SamlIdPMetadataResolver.java:64)
at 
org.opensaml.saml.metadata.resolver.impl.AbstractMetadataResolver.doInitialize(AbstractMetadataResolver.java:289)
at 
net.shibboleth.utilities.java.support.component.AbstractInitializableComponent.initialize(AbstractInitializableComponent.java:65)
- locked <0x00000005850f62c0> (a 
org.apereo.cas.support.saml.idp.metadata.locator.SamlIdPMetadataResolver)

..

 

 

Has anyone noticed something similar?

 

Thanks!

 

___________________

Juan Quintanilla

-- 
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
--- 
You received this message because you are subscribed to the Google Groups "CAS 
Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] <mailto:[email protected]> .
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-user/BN6PR05MB3474E121288FEA5212432C27861A9%40BN6PR05MB3474.namprd05.prod.outlook.com
 
<https://groups.google.com/a/apereo.org/d/msgid/cas-user/BN6PR05MB3474E121288FEA5212432C27861A9%40BN6PR05MB3474.namprd05.prod.outlook.com?utm_medium=email&utm_source=footer>
 .

-- 
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
--- 
You received this message because you are subscribed to the Google Groups "CAS 
Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] <mailto:[email protected]> .
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-user/BL0PR04MB51569107299F3766876B135FCC1A9%40BL0PR04MB5156.namprd04.prod.outlook.com
 
<https://groups.google.com/a/apereo.org/d/msgid/cas-user/BL0PR04MB51569107299F3766876B135FCC1A9%40BL0PR04MB5156.namprd04.prod.outlook.com?utm_medium=email&utm_source=footer>
 .

-- 
- Website: https://apereo.github.io/cas
- Gitter Chatroom: https://gitter.im/apereo/cas
- List Guidelines: https://goo.gl/1VRrw7
- Contributions: https://goo.gl/mh7qDG
--- 
You received this message because you are subscribed to the Google Groups "CAS 
Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/a/apereo.org/d/msgid/cas-user/BL0PR04MB5156A9FBADE3B86BB7B2F4EBCC199%40BL0PR04MB5156.namprd04.prod.outlook.com.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to