Re: [cas-dev] Diagnosing Service Ticket validation errors

Waldbieser, Carl Tue, 21 Jul 2015 06:08:04 -0700

Brian,

I can't see your architecture diagram, because I don't have a Lucid Chart 
account.
You are correct in your assessment of why an ST could fail to validate:  
  (1) invalid ticket
  (2) reused ticket (special case of #1).
  (3) timed out ticket (special case of #1).


Additionally, I think that if the TGT were expired prior to validating the ST, 
the ST would also become invalid.

Since you mention you are using Hazelcast, I will assume you have a multi-CAS 
architecture and are propagating tickets between your nodes.
In the cases where your ST validations failed, are there any instances where 
the ST failed to validate at the *same* CAS node where the ST was issued?

If so, then it would seem to be a timeout issue (you could try increasing the 
ST timeout to see if that helps).  If not, it may be that your STs are being 
validated before Hazelcast has had a chance to propagate the ticket to the peer 
CAS node.

I am working with a multi-node CAS in pre-production using Hazelcast, and the 
CAS logs show something like this when Hazelcast receives a ticket from a peer:

  2015-07-17 16:53:53,611 DEBUG 
[net.unicon.cas.addons.ticket.registry.HazelcastTicketRegistry] - Returning 
Ticket[ST-260-cyVBxGmIHyDfrff0nplR-cas1.dev.lafayette.edu] from the Hazelcast 
IMap

If that entry came *after* the failed ST validation attempt, that would explain 
why the ST validation failed.

Thanks,
Carl

----- Original Message -----
From: "Bryan Wooten" <bryan.woo...@utah.edu>
To: cas-dev@lists.jasig.org
Sent: Monday, July 20, 2015 6:10:59 PM
Subject: [cas-dev] Diagnosing Service Ticket validation errors

Sorry if I am spamming this list but I am desperate.

We are getting random ST validation errors from our CAS clients, both internal 
and SaaS applications. Results in 500 errors to the end user.

On July 14th we got over 2000 of these errors out of about 30k successful 
logins. This led to (thanks ITIL
) awareness up to the VP level. I am under the gun to find a “solution” before 
the start of school August 24th.

I have turned up log level to debug on the CAS servers. I see successful 
validations in the logs, but not unsuccessful validations.
I also see ST creation in the audit log.

Now if I understand how CAS works, there can only be 3 reasons an ST won’t 
validate: it is being reused, it has timed out or it does not exist / is 
corrupted.

I just can’t find the actual code that validates the code and can log the EXACT 
reason.

Can someone point me to the method(s) that does the validation? I just want to 
add a log.debug message at the point of failure.

Today I found a validation failure that had 2 attempts, I can see when the ST 
was created and both attempts failed, so it wasn’t a re-use error?

Other info: version 3.5.2 with Hazelcast ticket registry. I have hazelcast 
logging set to debug and see some transfer over port 1501.

Here is a diagram of our infrastructure:

https://www.lucidchart.com/invitations/accept/da009b9d-e55f-4f95-9301-e6bd23d508ab

Yeah 2 Load Balancers (?). Netscape is really a Sun App Server. Why 2, because 
Peoplesoft can’t handle SHA-2 certs on the Netscalar. Yeah a mess. Not all 
failures go through the Sun App Server, but the majority do.

Thanks for any help.

-Bryan

-- 
You are currently subscribed to cas-dev@lists.jasig.org as: 
waldb...@lafayette.edu
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/cas-dev

-- 
You are currently subscribed to cas-dev@lists.jasig.org as: 
arch...@mail-archive.com
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/cas-dev

Re: [cas-dev] Diagnosing Service Ticket validation errors

Reply via email to