Tracked this down to a credential store. Our configuration uses LDAP as primary, with legacy Kerberos (via JAAS) as fallback (to go away some day). Authentication usually fails through to Kerberos because of bad passwords. On rare occasion the user doesn’t have LDAP credentials.
Turning everything on the JAAS/Krb5LoginModule up to debug, we found an occasional socket timeout to the KDC correlated with the login timeouts. The default KDC request timeout buried down in the old Sun Java code (sun.security.krb5.KdcComm) is 30 seconds. By the time that happens, the AJP connector has timed out and aborted the POST. Options at this point are maybe to reduce the aggregate KDC timeout (default 3 tries * socket timeout) to less than the AJP proxy timeout, switch the Kerberos configuration to use TCP (default is “unreliable” UDP), maybe both, or remove the AJP timeout altogether. The down side of removing the AJP timeout might be a risk of a rare connection “hang” with no response (long timeout; not that we’ve seen it other than what’s described here). Tom. On Nov 8, 2016, at 12:15 PM, Tom Poage <[email protected]<mailto:[email protected]>> wrote: Running CAS 4.2.6 on Linux (Oracle/RedHat Linux 7, VM, one “CPU") w/ LDAP(tive) AuthN, Oracle Java 8, Tomcat 8(.0.33) fronted by Apache httpd 2.4 via AJP. The AJP connector is (somewhat arbitrarily) set to a 20-second response timeout. Seeing occasional 500 errors returned on POST, with corresponding AJP header-receive timeout errors logged. This happens about one in 2000 POSTs to /cas/login, just enough that end users have sometimes commented on it. Anyone else observe something like this? Load on servers is low (0.1-0.2), no Hazelcast errors logged, doesn’t seem to be LDAP (and not DNS; LDAP IP in /etc/hosts). Prior attempts at monitoring garbage collection didn’t reveal any obvious problems. Prior attempts at monitoring via JMX didn’t seem to show anything unusual (then again, maybe we didn’t catch this “in the act”). Is a 20-second response timeout too short? If so, what’s reasonable? Thoughts on how to identify these “hangs”? -- - CAS gitter chatroom: https://gitter.im/apereo/cas - CAS mailing list guidelines: https://apereo.github.io/cas/Mailing-Lists.html - CAS documentation website: https://apereo.github.io/cas - CAS project website: https://github.com/apereo/cas --- You received this message because you are subscribed to the Google Groups "CAS Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/apereo.org/d/msgid/cas-user/3C5644E8-8061-4534-8DA7-B391196CEF6B%40ucdavis.edu.
