Re: [cas-user] Acceptable ticket replication loss/delay

Tom Poage Fri, 31 May 2013 13:49:10 -0700

Thank you all for the comments/info.

On 05/31/2013 06:57 AM, William G. Thompson, Jr. wrote:
>> So I get an ST validation failure on the companion node in about 0.3% (3 in 
>> 1000) of the cases.
> 
> What was the cause of the ST validation failure?  What was in the cas.log?


The error is that the ST doesn't exist:

> 2013-05-31 11:15:11,842 INFO [org.jasig.cas.CentralAuthenticationServiceImpl] 
> - ServiceTicket [] does not exist.
> 2013-05-31 11:15:11,843 INFO 
> [com.github.inspektr.audit.support.Slf4jLoggingAuditTrailManager] - Fri May 
> 31 11:15:11 PDT 
> 2013|CAS||SERVICE_TICKET_VALIDATE_FAILED|audit:unknown|128.120.XX.YYY|128.120.AA.BBB

> <cas:serviceResponse xmlns:cas='http://www.yale.edu/tp/cas'>
>         <cas:authenticationFailure code='INVALID_TICKET'>
>                 ticket &#039;&#039; not recognized
>         </cas:authenticationFailure>
> </cas:serviceResponse>

> Since ehcache is set to synchronous replication the ST should have
> already been available in the other node before CAS returned it to
> your test script.

I hoped that would be the case. BTW, turns out I misstated the
conditions for the 0.3% error rate in service ticket validation. The
test script was in fact choosing a random node (of the two) to perform
validation rather than what I stated (authN node A, validate node B), so
the real error rate with ST replication is roughly twice that. Modifying
the (inherited) test script to do what I originally stated does in fact
reflect an ST validation error rate of about 0.6%. This exceeds my
comfort level.

FWIW, I'm using The Grinder (http://grinder.sourceforge.net/).

> may be closer to a stress test than a load test.  As Jérôme points out
> the 50ms pause is likely way to short given the actual solution
> architecture.  Consider that there are actually two https hops for a
> typical ST validation (browser -> service -> CAS).  If you have the
> time, I'd live to hear the results with a 500ms wait time.  And if you
> are really ambitions, would love to have the results against memcached
> ticket registry to compare to.

Good point. I didn't take the double hop into account.

So I changed the delay to 500ms, but it doesn't make a difference; ST
validation error rate (corrected: authN node A, validate node B) is 0.6%.

Anyhow, I was hoping to use Ehcache for a couple reasons: (1) it is
(natively) capable of replication (vs. overhead of using Terracotta,
JMS, ...), and (2) bootstrapping cache from peers means clients don't
have to reauthenticate when a node goes down.

Re: comments on memcached, I wouldn't mind using it if I could get
comparable behavior (repcached?). Of course, without the errors. :-)
Still might try it, given that it'd address needing to perform
administrative TGT invalidation once in a while. So many cache
implementations, so little time. :-)

Anyone play with Apache Commons JCS
(http://commons.apache.org/proper/commons-jcs/) with an eye toward CAS?

Thanks!
Tom.

-- 
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/cas-user

Re: [cas-user] Acceptable ticket replication loss/delay

Reply via email to