Related, I just got the latest shib-cas-authn plugin working on Shibboleth IDP 4.1.2 so we can delegate authentication to CAS (6.3.5). When I do this and try to authenticate, I see the following log message (45 times) and the response time from CAS is so long that our IDP timesout the seeion.
^XJul 8 10:12:53 105W user Loading SAML metadata from [/etc/cas/saml/metadata/sp_metadata.xml] Jul 8 10:12:53 105W user No metadata signature location is defined for [/etc/cas/saml/metadata/sp_metadata.xml], so SignatureValidationFilter will not be invoked Jul 8 10:12:53 105W user Initialized metadata resolver from [/etc/cas/saml/metadata/sp_metadata.xml] Jul 8 10:12:53 105W user SAML metadata resolver [org.opensaml.saml.metadata.resolver.ChainingMetadataResolver] obtained from the cache is unable to produce/resolve valid metadata [/etc/cas/saml/metadata/sp_metadata.xml]. Metadata resolver cache entry with key [f2317……] has been invalidated. Retry attempt: [1] Anyone ever seen this before? Thanks, Jay ________________________________ Jason Rappaport (he/him) Identity and Access Management Analyst Office of Information Technology Email: <mailto:[email protected]> [email protected] Office: 609-258-8464 From: [email protected] <[email protected]> On Behalf Of Jason B. Rappaport Sent: Wednesday, July 7, 2021 2:24 PM To: [email protected] Subject: [cas-user] RE: CAS 6.3 High CPU on Tomcat Good afternoon. Just to add a bit more information to this. Today we doubled the CPU and RAM for on our on-prem CAS servers; they are now at 4 CPUS and 16 GB of RAM. They are now stable (timeouts are gone for various services and the metrics endpoints are responding). Our off-prem CAS servers are running fine with 2 CPUS and 8 GB of RAM; no change was made to them. Juan – you mentioned Hazelcast, we use that as well for replicating information from our on-prem CAS servers to the off-prem CAS servers. We have also encountered several instances where our off-prem CAS servers CPU is pegged in both our QA and PROD environments. We have little to no traffic using our QA CAS servers and what is interesting is that both environments (QA and PROD) have pegged CPUs on the same days. When we investigated, we found that the Hazelcast cluster was constantly being reestablished. I posted a message on the Hazelcast support community https://groups.google.com/g/hazelcast/c/UmB1VzOBm-4 and then talked to the folks at Hazelcast. Basically what they said was that unless your virtual machine CAS servers are in the same datacenter, do not use the Hazelcast version that comes with CAS. The Hazelcast folks indicated that using TCP/IP to maintain session information on CAS severs that are too distant (like having CAS servers off-prem and some on-prem) would likely cause issues. They recommended purchasing their Hazelcast enterprise edition (which is really interesting and has a ton of cool features, but is also very expensive) that uses message queuing (MQ) technology instead of relying on TCP/IP to maintain session information. In your logs, look for “ Initialized new cluster connection between” we had 20k messages in one day that the CPUs were pegged. We asked our networking team about the stability between on campus and off campus cloud provider and they indicated the connection was stable enough that we would not notice any glitches; which doesn’t explain why QA and PROD saw pegged CPUs on the same day as those hosts don’t talk to each other. So for now, doubling the CPU and RAM on our on-prem CAS servers (which only handle ½ the authentication traffic as our off-prem CAS servers) seems to keep us stable…..for now. Attached is a screenshot showing our CPU usage. We upgraded CAS on 7.6 (yesterday), which can be seen on the chart where the CPU is averaging about 50%. Today around 12:00pm, we doubled CPU and RAM, rebooted, and now CPU is at 1%. The gap in the data…happens sometimes with our data collection tools. Thanks, Jay ________________________________ Jason Rappaport (he/him) Identity and Access Management Analyst Office of Information Technology Email: <mailto:[email protected]> [email protected] Office: 609-258-8464 From: [email protected] <mailto:[email protected]> <[email protected] <mailto:[email protected]> > On Behalf Of Juan Quintanilla Sent: Wednesday, July 7, 2021 1:42 PM To: CAS Community <[email protected] <mailto:[email protected]> > Subject: [cas-user] CAS 6.3 High CPU on Tomcat Hi, We are running CAS 6.3 with tomcat 9 and Java 11, and have SAML2 and oauth dependencies installed with hazelcast as ticket registry and json files for service registry. We have noticed that after a few days of running the CPU usage for tomcat spikes to above 100% and requires a restart for it to come back down. When we check the load on the server there isn't to many authentications happening. We had a similar tomcat configuration when running CAS 5.3 with tomcat 8.5 and didn't really see this issue. What we notice in the CAS logs is the metadata being loaded on several occasions over 10 times for a single authentication. [org.apereo.cas.support.saml.services.idp.metadata.SamlRegisteredServiceServiceProviderMetadataFacade] - <Resolved metadata chain from [/etc/cas/saml/sp_metadata In our tomcat configuration we are using nio connector together with the opensslimplementation, we have also tried switching to nio2 but we see similar behavior. We have even bumped up the CPU on the server but still encounter the same problem. Doing various thread dumps the issue doesn't seem to be with GC, or hazelcast. Below is a snippet of one of the threads that was showing up with high cpu during a thread dump. "https-openssl-nio-443-exec-63" #340 daemon prio=5 os_prio=0 cpu=783837.57ms elapsed=335561.54s tid=0x00007f44bc289800 nid=0x7c5f runnable [0x00007f4468f5d000] java.lang.Thread.State: RUNNABLE at java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames([email protected]/Native <mailto:[email protected]/Native> Method) at java.lang.StackStreamFactory$AbstractStackWalker.fetchStackFrames([email protected]/StackStreamFactory.java:386 <mailto:[email protected]/StackStreamFactory.java:386> ) at java.lang.StackStreamFactory$AbstractStackWalker.getNextBatch([email protected]/StackStreamFactory.java:322 <mailto:[email protected]/StackStreamFactory.java:322> ) at java.lang.StackStreamFactory$AbstractStackWalker.peekFrame([email protected]/StackStreamFactory.java:263 <mailto:[email protected]/StackStreamFactory.java:263> ) at java.lang.StackStreamFactory$AbstractStackWalker.hasNext([email protected]/StackStreamFactory.java:351) at java.lang.StackStreamFactory$StackFrameTraverser.tryAdvance([email protected]/StackStreamFactory.java:593 <mailto:[email protected]/StackStreamFactory.java:593> ) at java.util.stream.ReferencePipeline.forEachWithCancel([email protected]/ReferencePipeline.java:127 <mailto:[email protected]/ReferencePipeline.java:127> ) at java.util.stream.AbstractPipeline.copyIntoWithCancel([email protected]/AbstractPipeline.java:502 <mailto:[email protected]/AbstractPipeline.java:502> ) at java.util.stream.AbstractPipeline.copyInto([email protected]/AbstractPipeline.java:488 <mailto:[email protected]/AbstractPipeline.java:488> ) at java.util.stream.AbstractPipeline.wrapAndCopyInto([email protected]/AbstractPipeline.java:474 <mailto:[email protected]/AbstractPipeline.java:474> ) at java.util.stream.FindOps$FindOp.evaluateSequential([email protected]/FindOps.java:150 <mailto:[email protected]/FindOps.java:150> ) at java.util.stream.AbstractPipeline.evaluate([email protected]/AbstractPipeline.java:234 <mailto:[email protected]/AbstractPipeline.java:234> ) at java.util.stream.ReferencePipeline.findFirst([email protected]/ReferencePipeline.java:543 <mailto:[email protected]/ReferencePipeline.java:543> ) at org.apache.logging.log4j.util.StackLocator.lambda$getCallerClass$6(StackLocator.java:55) at org.apache.logging.log4j.util.StackLocator$$Lambda$163/0x0000000800376840.apply(Unknown Source) at java.lang.StackStreamFactory$StackFrameTraverser.consumeFrames([email protected]/StackStreamFactory.java:534 <mailto:[email protected]/StackStreamFactory.java:534> ) at java.lang.StackStreamFactory$AbstractStackWalker.doStackWalk([email protected]/StackStreamFactory.java:306 <mailto:[email protected]/StackStreamFactory.java:306> ) at java.lang.StackStreamFactory$AbstractStackWalker.callStackWalk([email protected]/Native <mailto:[email protected]/Native> Method) at java.lang.StackStreamFactory$AbstractStackWalker.beginStackWalk([email protected]/StackStreamFactory.java:370 <mailto:[email protected]/StackStreamFactory.java:370> ) at java.lang.StackStreamFactory$AbstractStackWalker.walk([email protected]/StackStreamFactory.java:243 <mailto:[email protected]/StackStreamFactory.java:243> ) at java.lang.StackWalker.walk([email protected]/StackWalker.java:498 <mailto:[email protected]/StackWalker.java:498> ) at org.apache.logging.log4j.util.StackLocator.getCallerClass(StackLocator.java:54) at org.apache.logging.log4j.util.StackLocatorUtil.getCallerClass(StackLocatorUtil.java:67) at org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:51) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:48) at org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:30) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:354) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:379) at org.opensaml.core.xml.AbstractXMLObject.<init>(AbstractXMLObject.java:48) at org.opensaml.saml.saml2.metadata.impl.EndpointImpl.<init>(EndpointImpl.java:59) at org.opensaml.saml.saml2.metadata.impl.SingleSignOnServiceImpl.<init>(SingleSignOnServiceImpl.java:40) at org.opensaml.saml.saml2.metadata.impl.SingleSignOnServiceBuilder.buildObject(SingleSignOnServiceBuilder.java:49) at org.opensaml.saml.saml2.metadata.impl.SingleSignOnServiceBuilder.buildObject(SingleSignOnServiceBuilder.java:31) at org.opensaml.core.xml.AbstractXMLObjectBuilder.buildObject(AbstractXMLObjectBuilder.java:58) at org.opensaml.core.xml.AbstractXMLObjectBuilder.buildObject(AbstractXMLObjectBuilder.java:73) at org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.buildXMLObject(AbstractXMLObjectUnmarshaller.java:182) at org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshall(AbstractXMLObjectUnmarshaller.java:104) at org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshallChildElement(AbstractXMLObjectUnmarshaller.java:337) at org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshall(AbstractXMLObjectUnmarshaller.java:128) at org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshallChildElement(AbstractXMLObjectUnmarshaller.java:337) at org.opensaml.core.xml.io.AbstractXMLObjectUnmarshaller.unmarshall(AbstractXMLObjectUnmarshaller.java:128) at org.opensaml.saml.metadata.resolver.impl.DOMMetadataResolver.initMetadataResolver(DOMMetadataResolver.java:68) at org.apereo.cas.support.saml.idp.metadata.locator.SamlIdPMetadataResolver.initMetadataResolver(SamlIdPMetadataResolver.java:64) at org.opensaml.saml.metadata.resolver.impl.AbstractMetadataResolver.doInitialize(AbstractMetadataResolver.java:289) at net.shibboleth.utilities.java.support.component.AbstractInitializableComponent.initialize(AbstractInitializableComponent.java:65) - locked <0x00000005850f62c0> (a org.apereo.cas.support.saml.idp.metadata.locator.SamlIdPMetadataResolver) .. Has anyone noticed something similar? Thanks! ___________________ Juan Quintanilla -- - Website: https://apereo.github.io/cas - Gitter Chatroom: https://gitter.im/apereo/cas - List Guidelines: https://goo.gl/1VRrw7 - Contributions: https://goo.gl/mh7qDG --- You received this message because you are subscribed to the Google Groups "CAS Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]> . To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/BN6PR05MB3474E121288FEA5212432C27861A9%40BN6PR05MB3474.namprd05.prod.outlook.com <https://groups.google.com/a/apereo.org/d/msgid/cas-user/BN6PR05MB3474E121288FEA5212432C27861A9%40BN6PR05MB3474.namprd05.prod.outlook.com?utm_medium=email&utm_source=footer> . -- - Website: https://apereo.github.io/cas - Gitter Chatroom: https://gitter.im/apereo/cas - List Guidelines: https://goo.gl/1VRrw7 - Contributions: https://goo.gl/mh7qDG --- You received this message because you are subscribed to the Google Groups "CAS Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]> . To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/BL0PR04MB51569107299F3766876B135FCC1A9%40BL0PR04MB5156.namprd04.prod.outlook.com <https://groups.google.com/a/apereo.org/d/msgid/cas-user/BL0PR04MB51569107299F3766876B135FCC1A9%40BL0PR04MB5156.namprd04.prod.outlook.com?utm_medium=email&utm_source=footer> . -- - Website: https://apereo.github.io/cas - Gitter Chatroom: https://gitter.im/apereo/cas - List Guidelines: https://goo.gl/1VRrw7 - Contributions: https://goo.gl/mh7qDG --- You received this message because you are subscribed to the Google Groups "CAS Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/BL0PR04MB5156A9FBADE3B86BB7B2F4EBCC199%40BL0PR04MB5156.namprd04.prod.outlook.com.
smime.p7s
Description: S/MIME cryptographic signature
