Pat, We saw a bunch of replication errors also. We haven't had too much time to delve into it as we had to loadtest a large set of scenarios (distributed, database, single machine). If you find anything interesting or an optimal configuration please let the list know and update it in Confluence, if you can ;-)
Thanks -Scott On Sun, Mar 9, 2008 at 9:24 AM, Pat Hennessy <[EMAIL PROTECTED]> wrote: > > So the first message I got was.. > > 2008-03-06 23:51:20,838 ERROR > [org.jasig.cas.ticket.registry.JBossCacheTicketRegistry] - > <org.jboss.cache.ReplicationException: rsp=sender=138.1 > 23.130.81:32772, retval=null, received=false, suspected=false> > org.jboss.cache.ReplicationException: rsp=sender=138.123.130.81:32772, > retval=null, received=false, suspected=false > > I found an old message on the list.. > > http://tp.its.yale.edu/pipermail/cas/2006-September/003412.html > > I'm using our VMWare Infrastructure for our two nodes. I had them on > two different hosts, so I then moved them to the same VMWare host and > everything appeared to be better. So, next week I plan on talking to > the network guy about using multicast on that switch (I had spoken to > him earlier and he thought the settings on it should be fine). > > I left one of my casified apps in the browser all night. This app does > a page refresh and uses one of the apache module, so it's been helpful > in finding these problems. > > The next day, I found another exception... > > 2008-03-08 11:02:09,130 ERROR > [org.apache.catalina.core.ContainerBase.[Catalina].[c-cas-02.dtcc.edu > ].[/cas].[cas]] > - <Servlet.service() for servlet cas threw exception> > org.jboss.cache.lock.TimeoutException: Response timed out: > sender=138.123.130.81:32772, retval=null, received=false, suspected=false > > These are happening on either node and it does appear that I get get > tickets and appear validated from both nodes. So the clustering is > working ok, but there seems to be a timing issue. > > Did some more searching and saw references on some Japanese site for > some totally different application to the "SyncReplTimeout" value. I > also found some document on Redhat's site about setting up JBoss. So, I > made some adjustments to the different timeouts. I think I read > something somewhere on some JBoss site that some timeouts need to be > shorter than others. I'm no Tomcat or JBoss expert, but I set the below > settings and I think it's been behaving as expected.. > > <attribute name="InitialStateRetrievalTimeout">15000</attribute> > > <attribute name="SyncReplTimeout">20000</attribute> > > <attribute name="LockAcquisitionTimeout">25000</attribute> > > I also wonder if the JBoss replication stuff is dependent on the system > clock. I noticed one of them was off and had to fiddle with ntp. > > My next goal is to load test logins with JMeter. I did try another > program that someone posted on the list, but I thought JMeter looked > better. I haven't exactly gotten that one working yet. But just > hitting the servers with page retrievals doesn't seem to cause any > exceptions. > > Pat > > On 3/7/08 5:25 PM, Pat Hennessy wrote: > > On 7/23/2007 11:45 AM, Brian Donnelly wrote: > >> Thanks Scott, > >> > >> I've attached my jbossCache.xml config file. It is almost identical to > the jbossTestCache.xml configuration included in CAS 3.0.6. I did have to > comment out the authentication protocol version tag because it was > generating errors. > >> > >> If anyone has any pointers or would be willing to send their JBossCache > configuation parameters, I'd be very appreciative. > >> > > > > Brian, > > > > Did you ever find a fix for the org.jboss.cache.ReplicationException > > error you found? > > > > I just setup the jboss replication using the directions on the CAS wiki > > (and the jbossTestCache.xml file). On the dev cluster, I didn't get the > > error. After putting it on our new to be production cluster, I've been > > finding the same error showing up as a RuntimeException with some of our > > test apps. I don't think we putting these services under any real load > > though. > > > > Pat > > > >> Thanks, > >> > >> Brian Donnelly > >> -- > >> Brian Donnelly > >> University of Calfornia, Davis > >> Information and Educational Technology > >> Middleware Team > >> (530) 754-5909 > >> [EMAIL PROTECTED] > >> > >> -----Original Message----- > >> From: Scott Battaglia [mailto:[EMAIL PROTECTED] > >> Sent: Fri 7/20/2007 6:29 AM > >> To: Brian Donnelly; Yale CAS mailing list > >> Subject: Re: JBossCache Ticket Registry performance under load? > >> > >> Brian, > >> > >> We don't deploy that at Rutgers so I can't comment on that. A few > people > >> have deployed it in production without issues. Maybe you can include > your > >> configuration file and those who have deployed it successfully can > compare > >> it to theirs if they get a minute (hopefully). > >> > >> Thanks > >> -Scott > >> > >> On 7/18/07, Brian Donnelly <[EMAIL PROTECTED]> wrote: > >>> Hi all, > >>> > >>> We're getting ready at UC Davis to switch to a JBossCache Clustered > >>> configuration for our CAS installation. I have been load testing two > >>> Redhat EL 5 clustered nodes running CAS 3.0.6 using the default > >>> JBossCache implementation, (UDP multicast.) > >>> > >>> I've been using JMeter to generate ~7 login actions per second. Both > >>> clustered servers perform fine for several hours. Somewhere in the > >>> third hour of testing, I start seeing the following errors in the > logs: > >>> > >>> 2007-07-18 13:43:54,813 ERROR > >>> [org.jasig.cas.ticket.registry.JBossCacheTicketRegistry] - > >>> <org.jboss.cache.ReplicationException: rsp=sender= > 169.237.104.235:53768, > >>> retval=null, received=false, suspected=false> > >>> > >>> and > >>> > >>> 2007-07-18 13:48:33,448 ERROR > >>> [org.apache.catalina.core.ContainerBase > .[Catalina].[localhost].[/cas].[cas]] > >>> - <Servlet.service() for servlet cas threw exception> > >>> > >>> These start piling up until both servers stop responding to incoming > >>> requests. A restart is required to restore service. > >>> > >>> Has anyone else encountered errors of this type in their testing of > the > >>> JBossCache registry? > >>> > >>> Thanks, > >>> > >>> Brian Donnelly > >>> -- > >>> Brian Donnelly > >>> University of Calfornia, Davis > >>> Information and Educational Technology > >>> Middleware Team > >>> (530) 754-5909 > >>> [EMAIL PROTECTED] > >>> _______________________________________________ > >>> Yale CAS mailing list > >>> [email protected] > >>> http://tp.its.yale.edu/mailman/listinfo/cas > >>> > >> > >> > > > > > > > -- > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Pat Hennessy, RHCE ([EMAIL PROTECTED]) > > Senior Systems Specialist > Division of Information and Educational Technology > Delaware Technical and Community College > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > _______________________________________________ > Yale CAS mailing list > [email protected] > http://tp.its.yale.edu/mailman/listinfo/cas > -- -Scott Battaglia PGP Public Key Id: 0x383733AA LinkedIn: http://www.linkedin.com/in/scottbattaglia
_______________________________________________ Yale CAS mailing list [email protected] http://tp.its.yale.edu/mailman/listinfo/cas
