Let me explain the Ticket Serialization collision problem with an example. We imagine a user we call Tabguy. He is already logged into CAS SSO, but he likes to use the option in his browser to open a group of bookmarks as separate tabs. We assume that the browser will manifest the contents of the tabs immediately using threading rather than waiting for him to click on the tab and display the page. So more or less simultaneously the browser generates two or more requests for applications that use CAS, and because there is a TGT and a cookie they are non-interactive requests for new STs. One of the tabs happens to come first and login. Because of the SingleSignOut support, this not only generates the ST but it also adds an entry in the Map in the TGT of issued STs and their Services. The ST is put into the TicketRegistry and disappears into the mumbleCache (eh-, mem-, jboss-) implementation of TicketRegistry. In practice, the default (paranoid) configuration of all CAS TicketRegistries causes a synchronous replication of the ST, but because the ST has a reference to the TGT the writeObject() method also tries to make a copy of the TGT. [Once upon a time (2.4.2) the TGT also had a table of references to STs, but this is no longer the case. So, thankfully, you only get the ST and the TGT (and its associated Authentication, Principal, Credentials, but no other STs] However, the Web Server is multithreaded, and it has assigned a second thread to handle the second tab, and so on. The CPU is multicore, so the threads run concurrently. At some point, the second tab thread issues an ST. As part of that process, the thread is trying to add a new ST ID and Service to the Map in the TGT maintained for SingleSignOut. Meanwhile, the first thread is trying to Serialize the ST. Since there is a reference to the TGT, it also Serializes the TGT. Because the TGT has a Map, serialization internally obtains an iterator over the Map. It starts to iterate entries in the Map. Now back to the second thread, it is trying to add an entry to the Map that the first thread is trying to iterate through. That is a big NO-NO in Java. Now it will work 99.9999% of the time and maybe all the time if the map is small. But at some point adding a new element to the Map reorganizes it enough to break the iterator, and then you get a ConcurrentAccessException. TicketRegistries synchronize addTicket operations with each other, but you cannot synchronize with the writeObject() because that call is somewhere buried in mumbleCache .
The solution is to modify the class being serialized (TicketGrantingTicketImpl) and add a standard bit of boilerplate: private synchronized void writeObject(ObjectOutputStream s) throws IOException { s.defaultWriteObject();} This is a Java idiom that tells the Serialization mechanism to lock the object before serializing (and iterating through) it. This automatically synchronizes with any method of the class that is also declared to be synchronized. So if the methods that add an entry to the Map are declared to be synchronized then the Map.put will wait for the iteration or the iteration will wait for the Map.put to end and all is safe. This presupposes that mumbleCache was written by people smart enough not to try to serialize one object while they hold the lock on another object in the same category. The Java idiom is widely enough used that it is unlikely that anyone would be dumb enough to do that, but we have to accept mumbleCache as a black box because that is the deal. THIS IS THE ISSUE. The fix is easy, but everyone has to sign off that they believe that none of the TicketRegistry implementations was written by people who do not know how to handle synchronized objects. Getting that agreement is the entire problem. Note that this problem is not really load related. Tabguy can create the problem on a CAS nobody else is using. It depends on two concurrent threads in the Web Server processing two concurrent requests from the same browser colliding at exactly the right moment. The window is so small that this is why you don’t see it much. [Though back in 2.4.2 when the TGT has a table of references to STs, and serialization could process hundreds of tickets and megabytes of data and the iterator had to remain valid through the entire process, the window was enormous and it was easy to hit.] From: Misagh Moayyed [mailto:mmoay...@unicon.net] Sent: Thursday, December 11, 2014 12:25 PM To: cas-dev@lists.jasig.org Subject: RE: [cas-dev] Reducing CASImpl's complexity: ArgExtractors and more Small note on the serialization issue before diving deeper: Part of the difficulty here is the assumption that the entire object could be serializable which would make it challenging when things start to cross reference each other in a non-trivial way. I suppose if the design separated the actual object model from the serialization model this problem would be reduced to some extent. Something like a serialization proxy might work well (which I believe is something Scott recently did with the CAS client) but it still lots of boilerplate code. From: Jérôme LELEU [mailto:lel...@gmail.com] Sent: Thursday, December 11, 2014 6:24 AM To: cas-dev@lists.jasig.org<mailto:cas-dev@lists.jasig.org> Subject: Re: [cas-dev] Reducing CASImpl's complexity: ArgExtractors and more Hi, Thanks for jumping into the discussion. All opinions are welcome, and not only from committers. You raise an interesting idea about the relationships between tickets. So far, real service tickets objects are hold inside TGTs instead of simple identifiers. We could use identifiers: it would certainly make things easier for Serialization but would require more steps to get the information. 2 steps to get all the service tickets of a TGT instead of one. -- You are currently subscribed to cas-dev@lists.jasig.org as: arch...@mail-archive.com To unsubscribe, change settings or access archives, see http://www.ja-sig.org/wiki/display/JSG/cas-dev