Scott,
Great. I will grab it and retest. This will probably solve the issue.
Adam
Scott Battaglia wrote:
On Fri, Oct 24, 2008 at 7:58 PM, Adam Rybicki <[EMAIL PROTECTED]>
wrote:
Scott,
I mis-diagnosed the issue. I just ran the same test, except I only ran
one instance of memcached. I am getting a high error rate on ticket
validations. So, it has nothing to do with memcached replication. To
investigate further, I disabled the second CAS server, and all errors
are gone. Of course that is not a viable workaround. :-)
My guess is that the error occurs when a ticket issued by one CAS
server is being validated on another CAS server. I could not find a
way to enable debug logging in /cas/serviceValidate, but I think I have
found a major clue. It took most of the day today to hunt this down.
With a single instance of memcached running in verbose mode you can see
a sequence of messages like this:
<11 add
ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 1 300 2689
>11 STORED
<7 get ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1
>7 sending key ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1
>7 END
<7 replace ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 1 300 2689
>7 STORED
<7 delete ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 0
>7 DELETED
This is when everything went OK. The
sequence below, however, represents a service ticket that failed to
validate. That's apparently because an attempt to read the ticket was
made before it was actually stored in cache!
<11 add
ST-8024-tKeeo5gYhjqoQzstAgqO-arybicki1 1 300 2689
<7 get ST-8024-tKeeo5gYhjqoQzstAgqO-arybicki1
>7 END
>11 STORED
There may be some code that synchronizes
access to the same object from the same client. However, it would seem
that the service ticket is returned by CAS before it's actually stored
in memcached. If this service ticket is then presented to another
instance of CAS for validation, it fails to retrieve it from memcached
because the "add" operation has not completed.
Again, I have to emphasize that this is not an unrealistic test. The
jMeter is simply following redirects at the time of the failure, as a
browser would.
We never saw that in production and we ran 500 virtual users. However,
if you are experiencing it, you most likely could update the
MemcacheTicketRegistry to block on the Futures. I've actually updated
the code in HEAD with an option to block on Futures. :-)
I have not tried it at all, since I wrote it all of 30 seconds ago.
You can grab it from HEAD and try it out. The new property to enable
it is "synchronizeUpdatesToRegistry"
Let me know if it helps/doesn't help.
-Scott
Adam
Scott Battaglia wrote:
You have no need for sticky sessions. If you
have two
repcached servers and you've told your CAS instance about both of them,
the memcached client essentially sees them as two memcached servers
(since its not familiar with repcached).
The memcached client works in that it takes a hash of the key and that
determines what instance of memcached/repcached to store the item on.
repcached will then do its async replication. When you come to
validate a ticket the memcached client will again hash the key to
determine what server the item is stored on. If that server is
unreachable (as determined by the memcached client) then it will try
the next likely server that would hold the data.
-Scott
-Scott Battaglia
PGP Public Key Id: 0x383733AA
LinkedIn: http://www.linkedin.com/in/scottbattaglia
On Fri, Oct 24, 2008 at 8:21 AM, Andrew
Ralph Feller, afelle1 <[EMAIL PROTECTED]>
wrote:
So what you are saying is that even with
replication enabled, asynchronous replication CAS clusters should have
sticky sessions on regardless? I realize that synchronous replication
CAS cluster have no need of sticky sessions seeing as how it goes to
all servers before the user can move on.
Andrew
It actually shouldn't matter if the async
works or not. The memcache clients are designed to hash to a
particular server and only check the backup servers if the primary
isn't available.
So you should always be validating against the original server unless
its no longer there.
-Scott
-Scott Battaglia
PGP Public Key Id: 0x383733AA
LinkedIn: http://www.linkedin.com/in/scottbattaglia
On Thu, Oct 23, 2008 at 9:17 PM, Adam Rybicki <[EMAIL PROTECTED]> wrote:
Scott,
I have run into a issue with MemCacheTicketRegistry and was wondering
if you have any thoughts. I didn't want to create a new thread for
this note. Anyone else with comments should feel free to reply, too.
;-)
My tests have shown that when a ticket is generated on a CAS cluster
member it may sometimes fail to validate. This is apparently because
the memcached asynchronous replication did not manage to send the
ticket replica in time. Fast as repcached may be, under a relatively
light load, ST validation failed in 0.1% of the cases, or once in 1000
attempts. It would seem that the following tasks should be fairly
complex:
- Browser accesses a CAS-protected
service
- Service redirects to CAS for
authentication
- CAS validates the TGT
- CAS issues the ST for service
- CAS redirects the browser to service
- Service sends the ST for validation
But they are fast! My jMeter testing
showed this taking 28 milliseconds under light load on CAS server ,
which is amazingly fast. Please note that in real life, this can be
just as fast because the browser, CAS, and service perform these steps
without the user slowing them down. CAS is indeed a lightweight
system, and memcached does nothing to slow it down. It seems that in
0.1% of the cases this outperforms repcached under light load. The bad
news is that under heavy load the failure rate increases. I've seen as
bad as 8% failure rate.
Have you or anyone else seen this? Have you had to work around this?
Thanks,
Adam
Scott Battaglia wrote:
On Tue, Oct 14, 2008 at 11:15 AM, Andrew Ralph Feller, afelle1 <[EMAIL PROTECTED]>
wrote:
Hey Scott,
Thanks for answering some questions; really appreciate it. Just a
handful more:
- What happens whenever the
server it intends to replicate with is down?
It doesn't replicate :-) The client will send its request to the
primary server and if the primary server is down it will replicate to
the secondary. The repcache server itself will not replicate to the
other server if it can't find it.
-
- What happens whenever it
comes back up?
The repcache servers will sync with each other. The memcache clients
will continue to function as they should
-
- Does the newly recovered
machine synchronize itself with the other servers?
The newly recovered machine with synchronize with its paired memcache
server.
-Scott
-
Thanks,
Andrew
Memcache, as far as I know, uses a hash of the key to determine which
server to write to (and then with repcache, its replicated to its pair,
which you configure).
-Scott
-Scott Battaglia
PGP Public Key Id: 0x383733AA
LinkedIn: http://www.linkedin.com/in/scottbattaglia
Scott,
I've looked at the sample configuration file on the JA-SIG wiki,
however I was curious how memcached handles cluster membership for lack
of a better word. One of the things we are getting burned on by
JBoss/Jgroups is the frequency the cluster is being fragmented.
Thanks,
Andrew
We've disabled the registry cleaners since memcached has explicit time
outs (which are configurable on the registry). We've configured it by
default with 1 gb of RAM I think, though I doubt we need that much.
-Scott
-Scott Battaglia
PGP Public Key Id: 0x383733AA
LinkedIn: http://www.linkedin.com/in/scottbattaglia
I've been working on updating from 3.2 to 3.3 and wanted to give
memcached a try instead of JBoss. I read Scott's message about
performance and we've had good success here with memcached for other
applications. It also looks like using memcached instead of JBoss will
simplify the configuration changes for the CAS server.
I do have the JBoss replication working with CAS 3.2 but pounding the
heck out of it with JMeter will cause some not so nice stuff to happen.
I'm using VMWare VI3 and configured an isolated switch for the
clustering and Linux-HA traffic. I do see higher traffic levels coming
to my cluster in the future, but I'm not sure if they'll meet the levels
from my JMeter test. (I'm just throwing this out there because of the
recent Best practice thread.)
If I use memcached, is the ticketRegistryCleaner not needed anymore? I
left those beans in the ticketRegistry.xml file and saw all kinds of
errors. After taking it out it seems to load fine and appears to work,
but I wasn't sure what the behavior is and I haven't tested it further.
What if memcached fills up all the way? Does anyone have a general
idea of how much memory to allocate to memcached with regards to
concurrent logins and tickets stored?
Thanks,
Pat
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Patrick Hennessy ([EMAIL PROTECTED] <http://[EMAIL PROTECTED]>
<http://[EMAIL PROTECTED]> )
Senior Systems Specialist
Division of Information and Educational Technology
Delaware Technical and Community College
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
_______________________________________________
Yale CAS mailing list
[email protected] <http://[email protected]>
<http://[email protected]>
_______________________________________________
Yale CAS mailing list
[email protected] <http://[email protected]>
<http://[email protected]>
--
Andrew R. Feller, Analyst
Information Technology Services
200 Fred Frey Building
Louisiana State University
Baton Rouge, LA 70803
(225) 578-3737 (Office)
(225) 578-6400 (Fax)
_______________________________________________
Yale CAS mailing list
[email protected]
http://tp.its.yale.edu/mailman/listinfo/cas
--
Andrew R. Feller, Analyst
Information Technology Services
200 Fred Frey Building
Louisiana State University
Baton Rouge, LA 70803
(225) 578-3737 (Office)
(225) 578-6400 (Fax)
_______________________________________________
Yale CAS mailing list
[email protected]
http://tp.its.yale.edu/mailman/listinfo/cas
_______________________________________________
Yale CAS mailing list
[email protected]
http://tp.its.yale.edu/mailman/listinfo/cas
_______________________________________________
Yale CAS mailing list
[email protected]
http://tp.its.yale.edu/mailman/listinfo/cas
|