So to follow up on this, what appeared to precipitate the failure is clients looping on service ticket requests, filling the ST cache to one of the (handful of) capacity limits.
We've seen this before but, so far, are at a loss to explain it. They're not failed logins. Of the hundred thousand or so people using the service, it affects only a handful at a time. I might initially suspect the CAS client software (one instance I can think of is "home grown"). Given it affects specific clients, I get the idea the browser is somehow involved in this, e.g. one time we caught up with a student where this was happening and a browser restart resolved the looping. I know CAS has the ability to throttle login attempts. We could simply attempt to throttle requests by source IP, except that certain high-use services are proxied behind a single egress point, ruling that out. Any thoughts on detecting loops for a specific TGT (cf. TGC)? Thanks. Tom. On May 2, 2014, at 9:02 AM, Tom Poage <[email protected]> wrote: > On May 1, 2014, at 3:59 PM, Tom Poage <[email protected]> wrote: >> CAS node today starts throwing disk write errors: >> >>> 2014-05-01 15:20:52,521 ERROR >>> [net.sf.ehcache.store.disk.DiskStorageFactory] - <Disk Write of >>> ST-...-caswebNN failed: > >>> java.util.ConcurrentModificationException >>> at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894) >> ... >> >> ST on-disk cache size pretty close to 'magic' value 2.2 GB: > > And found the heap on the JVM was set to 2 GB. -- You are currently subscribed to [email protected] as: [email protected] To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-user
