Also, how is your memcacheticketregistry configured? We have the following snippit which is exactly the same on both machines (machine names changed to protect the innocent):
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:util="http://www.springframework.org/schema/util" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.5.xsd"> <util:list id="memcachedServers"> <value>SERVER1.ess.rutgers.edu:11211</value> <value>SERVER2.ess.rutgers.edu:11211</value> </util:list> </beans> And in another file we have the following: <bean id="ticketRegistry" class="org.jasig.cas.ticket.registry.MemCacheTicketRegistry"> <constructor-arg index="0"> <ref bean="memcachedServers" /> </constructor-arg> <constructor-arg index="1" type="int" value="21600" /> <constructor-arg index="2" type="int" value="300" /> </bean> -Scott Battaglia PGP Public Key Id: 0x383733AA LinkedIn: http://www.linkedin.com/in/scottbattaglia On Mon, Oct 27, 2008 at 9:20 AM, Scott Battaglia <[EMAIL PROTECTED]>wrote: > On Fri, Oct 24, 2008 at 7:58 PM, Adam Rybicki <[EMAIL PROTECTED]> wrote: > >> Scott, >> >> I mis-diagnosed the issue. I just ran the same test, except I only ran >> one instance of memcached. I am getting a high error rate on ticket >> validations. So, it has nothing to do with memcached replication. To >> investigate further, I disabled the second CAS server, and all errors are >> gone. Of course that is not a viable workaround. :-) >> >> My guess is that the error occurs when a ticket issued by one CAS server >> is being validated on another CAS server. I could not find a way to enable >> debug logging in /cas/serviceValidate, but I think I have found a major >> clue. It took most of the day today to hunt this down. >> >> With a single instance of memcached running in verbose mode you can see a >> sequence of messages like this: >> ------------------------------ >> <11 add ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 1 300 2689 >> >11 STORED >> <7 get ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 >> >7 sending key ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 >> >7 END >> <7 replace ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 1 300 2689 >> >7 STORED >> <7 delete ST-8023-M0sU2U2ijyQ53QPYWnGm-arybicki1 0 >> >7 DELETED >> ------------------------------ >> This is when everything went OK. The sequence below, however, represents >> a service ticket that failed to validate. That's apparently because an >> attempt to read the ticket was made before it was actually stored in cache! >> ------------------------------ >> <11 add ST-8024-tKeeo5gYhjqoQzstAgqO-arybicki1 1 300 2689 >> <7 get ST-8024-tKeeo5gYhjqoQzstAgqO-arybicki1 >> >7 END >> >11 STORED >> ------------------------------ >> There may be some code that synchronizes access to the same object from >> the same client. However, it would seem that the service ticket is returned >> by CAS before it's actually stored in memcached. If this service ticket is >> then presented to another instance of CAS for validation, it fails to >> retrieve it from memcached because the "add" operation has not completed. >> >> Again, I have to emphasize that this is not an unrealistic test. The >> jMeter is simply following redirects at the time of the failure, as a >> browser would. >> > > We never saw that in production and we ran 500 virtual users. However, if > you are experiencing it, you most likely could update the > MemcacheTicketRegistry to block on the Futures. I've actually updated the > code in HEAD with an option to block on Futures. :-) > > I have not tried it at all, since I wrote it all of 30 seconds ago. You > can grab it from HEAD and try it out. The new property to enable it is > "synchronizeUpdatesToRegistry" > > Let me know if it helps/doesn't help. > > -Scott > > >> >> Adam >> >> Scott Battaglia wrote: >> >> You have no need for sticky sessions. If you have two repcached servers >> and you've told your CAS instance about both of them, the memcached client >> essentially sees them as two memcached servers (since its not familiar with >> repcached). >> >> The memcached client works in that it takes a hash of the key and that >> determines what instance of memcached/repcached to store the item on. >> repcached will then do its async replication. When you come to validate a >> ticket the memcached client will again hash the key to determine what server >> the item is stored on. If that server is unreachable (as determined by the >> memcached client) then it will try the next likely server that would hold >> the data. >> >> -Scott >> >> -Scott Battaglia >> PGP Public Key Id: 0x383733AA >> LinkedIn: http://www.linkedin.com/in/scottbattaglia >> >> >> On Fri, Oct 24, 2008 at 8:21 AM, Andrew Ralph Feller, afelle1 < >> [EMAIL PROTECTED]> wrote: >> >>> So what you are saying is that even with replication enabled, >>> asynchronous replication CAS clusters should have sticky sessions on >>> regardless? I realize that synchronous replication CAS cluster have no need >>> of sticky sessions seeing as how it goes to all servers before the user can >>> move on. >>> >>> Andrew >>> >>> >>> On 10/23/08 9:29 PM, "Scott Battaglia" <[EMAIL PROTECTED]> >>> wrote: >>> >>> It actually shouldn't matter if the async works or not. The memcache >>> clients are designed to hash to a particular server and only check the >>> backup servers if the primary isn't available. >>> >>> So you should always be validating against the original server unless its >>> no longer there. >>> >>> -Scott >>> >>> -Scott Battaglia >>> PGP Public Key Id: 0x383733AA >>> LinkedIn: http://www.linkedin.com/in/scottbattaglia >>> >>> >>> On Thu, Oct 23, 2008 at 9:17 PM, Adam Rybicki <[EMAIL PROTECTED]> >>> wrote: >>> >>> >>> Scott, >>> >>> I have run into a issue with MemCacheTicketRegistry and was wondering if >>> you have any thoughts. I didn't want to create a new thread for this note. >>> Anyone else with comments should feel free to reply, too. ;-) >>> >>> My tests have shown that when a ticket is generated on a CAS cluster >>> member it may sometimes fail to validate. This is apparently because the >>> memcached asynchronous replication did not manage to send the ticket replica >>> in time. Fast as repcached may be, under a relatively light load, ST >>> validation failed in 0.1% of the cases, or once in 1000 attempts. It would >>> seem that the following tasks should be fairly complex: >>> >>> - Browser accesses a CAS-protected service >>> - Service redirects to CAS for authentication >>> - CAS validates the TGT >>> - CAS issues the ST for service >>> - CAS redirects the browser to service >>> - Service sends the ST for validation >>> >>> But they are fast! My jMeter testing showed this taking 28 milliseconds >>> under light load on CAS server , which is amazingly fast. Please note that >>> in real life, this can be just as fast because the browser, CAS, and service >>> perform these steps without the user slowing them down. CAS is indeed a >>> lightweight system, and memcached does nothing to slow it down. It seems >>> that in 0.1% of the cases this outperforms repcached under light load. The >>> bad news is that under heavy load the failure rate increases. I've seen as >>> bad as 8% failure rate. >>> >>> Have you or anyone else seen this? Have you had to work around this? >>> >>> Thanks, >>> Adam >>> >>> Scott Battaglia wrote: >>> >>> >>> On Tue, Oct 14, 2008 at 11:15 AM, Andrew Ralph Feller, afelle1 < >>> [EMAIL PROTECTED]> wrote: >>> >>> >>> >>> >>> Hey Scott, >>> >>> Thanks for answering some questions; really appreciate it. Just a >>> handful more: >>> >>> >>> >>> 1. What happens whenever the server it intends to replicate with is >>> down? >>> >>> >>> >>> >>> >>> It doesn't replicate :-) The client will send its request to the primary >>> server and if the primary server is down it will replicate to the secondary. >>> The repcache server itself will not replicate to the other server if it >>> can't find it. >>> >>> >>> >>> >>> >>> >>> 1. >>> 2. >>> 3. What happens whenever it comes back up? >>> >>> >>> >>> >>> >>> The repcache servers will sync with each other. The memcache clients >>> will continue to function as they should >>> >>> >>> >>> >>> >>> >>> 1. >>> 2. >>> 3. Does the newly recovered machine synchronize itself with the >>> other servers? >>> >>> >>> >>> >>> >>> The newly recovered machine with synchronize with its paired memcache >>> server. >>> >>> -Scott >>> >>> >>> >>> >>> >>> >>> 1. >>> 2. >>> >>> >>> Thanks, >>> Andrew >>> >>> >>> On 10/14/08 9:56 AM, "Scott Battaglia" <[EMAIL PROTECTED] < >>> http://[EMAIL PROTECTED]> > wrote: >>> >>> >>> >>> >>> >>> Memcache, as far as I know, uses a hash of the key to determine which >>> server to write to (and then with repcache, its replicated to its pair, >>> which you configure). >>> >>> -Scott >>> >>> -Scott Battaglia >>> PGP Public Key Id: 0x383733AA >>> LinkedIn: http://www.linkedin.com/in/scottbattaglia >>> >>> >>> On Tue, Oct 14, 2008 at 10:38 AM, Andrew Ralph Feller, afelle1 < >>> [EMAIL PROTECTED] <http://[EMAIL PROTECTED]> > wrote: >>> >>> >>> >>> >>> Scott, >>> >>> I've looked at the sample configuration file on the JA-SIG wiki, however >>> I was curious how memcached handles cluster membership for lack of a better >>> word. One of the things we are getting burned on by JBoss/Jgroups is the >>> frequency the cluster is being fragmented. >>> >>> Thanks, >>> Andrew >>> >>> >>> >>> >>> >>> On 10/14/08 8:58 AM, "Scott Battaglia" <[EMAIL PROTECTED] < >>> http://[EMAIL PROTECTED]> <http://[EMAIL PROTECTED]> > >>> wrote: >>> >>> >>> >>> >>> >>> We've disabled the registry cleaners since memcached has explicit time >>> outs (which are configurable on the registry). We've configured it by >>> default with 1 gb of RAM I think, though I doubt we need that much. >>> >>> -Scott >>> >>> -Scott Battaglia >>> PGP Public Key Id: 0x383733AA >>> LinkedIn: http://www.linkedin.com/in/scottbattaglia >>> >>> >>> >>> >>> On Mon, Oct 13, 2008 at 11:41 PM, Patrick Hennessy <[EMAIL PROTECTED]< >>> http://[EMAIL PROTECTED]> <http://[EMAIL PROTECTED]> > wrote: >>> >>> >>> >>> >>> >>> I've been working on updating from 3.2 to 3.3 and wanted to give >>> memcached a try instead of JBoss. I read Scott's message about >>> performance and we've had good success here with memcached for other >>> applications. It also looks like using memcached instead of JBoss will >>> simplify the configuration changes for the CAS server. >>> >>> I do have the JBoss replication working with CAS 3.2 but pounding the >>> heck out of it with JMeter will cause some not so nice stuff to happen. >>> I'm using VMWare VI3 and configured an isolated switch for the >>> clustering and Linux-HA traffic. I do see higher traffic levels coming >>> to my cluster in the future, but I'm not sure if they'll meet the levels >>> from my JMeter test. (I'm just throwing this out there because of the >>> recent Best practice thread.) >>> >>> If I use memcached, is the ticketRegistryCleaner not needed anymore? I >>> left those beans in the ticketRegistry.xml file and saw all kinds of >>> errors. After taking it out it seems to load fine and appears to work, >>> but I wasn't sure what the behavior is and I haven't tested it further. >>> What if memcached fills up all the way? Does anyone have a general >>> idea of how much memory to allocate to memcached with regards to >>> concurrent logins and tickets stored? >>> >>> Thanks, >>> >>> Pat >>> -- >>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >>> >>> Patrick Hennessy ([EMAIL PROTECTED] < >>> http://[EMAIL PROTECTED]> <http://[EMAIL PROTECTED]> ) >>> >>> >>> Senior Systems Specialist >>> Division of Information and Educational Technology >>> Delaware Technical and Community College >>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >>> _______________________________________________ >>> Yale CAS mailing list >>> >>> [email protected] <http://[email protected]> < >>> http://[email protected]> >>> >>> http://tp.its.yale.edu/mailman/listinfo/cas >>> >>> >>> >>> >>> >>> >>> ------------------------------ >>> _______________________________________________ >>> Yale CAS mailing list >>> [email protected] <http://[email protected]> < >>> http://[email protected]> >>> >>> http://tp.its.yale.edu/mailman/listinfo/cas >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Andrew R. Feller, Analyst >>> Information Technology Services >>> 200 Fred Frey Building >>> Louisiana State University >>> Baton Rouge, LA 70803 >>> (225) 578-3737 (Office) >>> (225) 578-6400 (Fax) >>> >>> >>> >>> _______________________________________________ >>> Yale CAS mailing list >>> [email protected] >>> http://tp.its.yale.edu/mailman/listinfo/cas >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> Yale CAS mailing list >>> [email protected] >>> http://tp.its.yale.edu/mailman/listinfo/cas >>> >>> >>> >>> _______________________________________________ >>> Yale CAS mailing list >>> [email protected] >>> http://tp.its.yale.edu/mailman/listinfo/cas >>> >>> >>> >>> ------------------------------ >>> _______________________________________________ >>> Yale CAS mailing list >>> [email protected] >>> http://tp.its.yale.edu/mailman/listinfo/cas >>> >>> >>> -- >>> Andrew R. Feller, Analyst >>> Information Technology Services >>> 200 Fred Frey Building >>> Louisiana State University >>> Baton Rouge, LA 70803 >>> (225) 578-3737 (Office) >>> (225) 578-6400 (Fax) >>> >>> _______________________________________________ >>> Yale CAS mailing list >>> [email protected] >>> http://tp.its.yale.edu/mailman/listinfo/cas >>> >>> >> ------------------------------ >> >> _______________________________________________ >> Yale CAS mailing [EMAIL PROTECTED]://tp.its.yale.edu/mailman/listinfo/cas >> >> >> _______________________________________________ >> Yale CAS mailing list >> [email protected] >> http://tp.its.yale.edu/mailman/listinfo/cas >> >> >
_______________________________________________ Yale CAS mailing list [email protected] http://tp.its.yale.edu/mailman/listinfo/cas
