Hello,

We have a two nodes (active-active) CAS 4.0 environment backed by a JPA
ticket registry on an active-active MySQL setup. This setup runs for about
3 to 4 months now, but we are experiencing some problems while cleaning the
registry since we have a long TGT timeout: 90 days. The idea is basically
that users don't have to login again if they don't want to.

This ran fine for about a month when TGTs started to pile up (this was
expected of course), after that we experienced the same problems as
described here: https://lists.wisc.edu/read/messages?id=11402745. I fixed
those by making sure that we are 'streaming' (forward-only ResultSet) the
tickets and only adding them to the list of expired tickets if they are
expired.

This ran fine for about another two monts, which brings us to now. I again
received a report of OutOfMemory errors and couldn't figure out where it
went wrong since I tested the previous setup with a database with close to
650K TGTs and the cleaner ran fine. When I got a heap dump I was shocked to
see that there were tickets in there which took close to 10Mb of memory
(deserialized). After investigation this was pretty much all allocated in
the 'services' map that is stored in the TGT.

As far as I understand (and please correct me if I'm wrong) this is a map
{ServiceTicketId -> Service} and an entry is put into this map whenever a
ST is granted for this TGT. So this is (seems to be?) an every growing map
for every TGT and thus every ticket in the database is growing over time.
Not a sustainable situation in our case since we have this long timeout.

I'm thinking this is because of the Single Logout feature: at that point we
can retrieve which STs have actually been given out and whether or not we
need to trigger a (backchannel) logout on that service. However for that
use case it's not really logical to keep track of all STs that have every
been given out right? I would say that in this case:

   1. user has TGT-1
   2. user goed to service 1 and generates/validates ST-1 for TGT-1
   3. user gets a client session for service 1
   4. user goes to lunch and client session times out for service 1
   5. user uses service 1 again and generates/validates ST-2 for TGT-2
   6. user gets a new client session for service 1
   7. ...

At this point the TGT-1 contains two entries in the 'services' map (ST-1 ->
service 1, ST-2 -> service 1). But I would say a logout for the first
client session (3) doesn't every need to happen again. This timeout is not
explicitly signalled to CAS of course, but the fact that another ST is
generated / validated for the same service (4) isn't that enough of a
signal to lose any information about ST-1 in the 'services' map? Is this
information still relevant for anything else?
If not I would have guessed this services to be structured the other way
around actually { Service -> Service Ticket } to keep track of the latest
ST given out for a given service.

... Any insights in how to remedy this situation as best as possible? This
code is so deep in CAS I can't really work around it as far as I can see. I
was thinking of a background thread (kind of like the cleaner) that will
clean up these services with some nasty reflection stuff :-S, but I'd
rather have another solution of course.

--
Auke

-- 
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/cas-user

Reply via email to