Let me explain the Ticket Serialization collision problem with an example.
We imagine a user we call Tabguy. He is already logged into CAS SSO, but he
likes to use the option in his browser to open a group of bookmarks as separate
tabs.
We assume that the browser will manifest the contents of the tabs immediately
using threading rather than waiting for him to click on the tab and display the
page.
So more or less simultaneously the browser generates two or more requests for
applications that use CAS, and because there is a TGT and a cookie they are
non-interactive requests for new STs.
One of the tabs happens to come first and login. Because of the SingleSignOut
support, this not only generates the ST but it also adds an entry in the Map in
the TGT of issued STs and their Services.
The ST is put into the TicketRegistry and disappears into the mumbleCache (eh-,
mem-, jboss-) implementation of TicketRegistry. In practice, the default
(paranoid) configuration of all CAS TicketRegistries causes a synchronous
replication of the ST, but because the ST has a reference to the TGT the
writeObject() method also tries to make a copy of the TGT.
[Once upon a time (2.4.2) the TGT also had a table of references to STs, but
this is no longer the case. So, thankfully, you only get the ST and the TGT
(and its associated Authentication, Principal, Credentials, but no other STs]
However, the Web Server is multithreaded, and it has assigned a second thread
to handle the second tab, and so on. The CPU is multicore, so the threads run
concurrently.
At some point, the second tab thread issues an ST. As part of that process, the
thread is trying to add a new ST ID and Service to the Map in the TGT
maintained for SingleSignOut.
Meanwhile, the first thread is trying to Serialize the ST. Since there is a
reference to the TGT, it also Serializes the TGT. Because the TGT has a Map,
serialization internally obtains an iterator over the Map. It starts to iterate
entries in the Map.
Now back to the second thread, it is trying to add an entry to the Map that the
first thread is trying to iterate through. That is a big NO-NO in Java. Now it
will work 99.9999% of the time and maybe all the time if the map is small. But
at some point adding a new element to the Map reorganizes it enough to break
the iterator, and then you get a ConcurrentAccessException.
TicketRegistries synchronize addTicket operations with each other, but you
cannot synchronize with the writeObject() because that call is somewhere buried
in mumbleCache .
The solution is to modify the class being serialized (TicketGrantingTicketImpl)
and add a standard bit of boilerplate:
private synchronized void writeObject(ObjectOutputStream s) throws IOException
{ s.defaultWriteObject();}
This is a Java idiom that tells the Serialization mechanism to lock the object
before serializing (and iterating through) it.
This automatically synchronizes with any method of the class that is also
declared to be synchronized. So if the methods that add an entry to the Map
are declared to be synchronized then the Map.put will wait for the iteration or
the iteration will wait for the Map.put to end and all is safe.
This presupposes that mumbleCache was written by people smart enough not to try
to serialize one object while they hold the lock on another object in the same
category. The Java idiom is widely enough used that it is unlikely that anyone
would be dumb enough to do that, but we have to accept mumbleCache as a black
box because that is the deal.
THIS IS THE ISSUE. The fix is easy, but everyone has to sign off that they
believe that none of the TicketRegistry implementations was written by people
who do not know how to handle synchronized objects. Getting that agreement is
the entire problem.
Note that this problem is not really load related. Tabguy can create the
problem on a CAS nobody else is using. It depends on two concurrent threads in
the Web Server processing two concurrent requests from the same browser
colliding at exactly the right moment. The window is so small that this is why
you don’t see it much.
[Though back in 2.4.2 when the TGT has a table of references to STs, and
serialization could process hundreds of tickets and megabytes of data and the
iterator had to remain valid through the entire process, the window was
enormous and it was easy to hit.]
From: Misagh Moayyed [mailto:[email protected]]
Sent: Thursday, December 11, 2014 12:25 PM
To: [email protected]
Subject: RE: [cas-dev] Reducing CASImpl's complexity: ArgExtractors and more
Small note on the serialization issue before diving deeper: Part of the
difficulty here is the assumption that the entire object could be serializable
which would make it challenging when things start to cross reference each other
in a non-trivial way. I suppose if the design separated the actual object model
from the serialization model this problem would be reduced to some extent.
Something like a serialization proxy might work well (which I believe is
something Scott recently did with the CAS client) but it still lots of
boilerplate code.
From: Jérôme LELEU [mailto:[email protected]]
Sent: Thursday, December 11, 2014 6:24 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: [cas-dev] Reducing CASImpl's complexity: ArgExtractors and more
Hi,
Thanks for jumping into the discussion. All opinions are welcome, and not only
from committers.
You raise an interesting idea about the relationships between tickets. So far,
real service tickets objects are hold inside TGTs instead of simple
identifiers. We could use identifiers: it would certainly make things easier
for Serialization but would require more steps to get the information. 2 steps
to get all the service tickets of a TGT instead of one.
--
You are currently subscribed to [email protected] as:
[email protected]
To unsubscribe, change settings or access archives, see
http://www.ja-sig.org/wiki/display/JSG/cas-dev