After putting session replication in for CAS 3.3.5 via jBoss we just recently 
started to notice odd behavior in our logs. I'll describe the behavior and then 
list some ideas we've had and would like some input on these ideas...

Behavior:

In our tomcat logs, we see the following happen somewhat frequently:

Mar 11, 2010 4:08:19 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster 
memberDisappeared
INFO: Received member 
disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://XXX.XXX.XXX.XXX:4001,catalina,XXX.XXX.XXX.XXX,4001,
 alive=125032780]
Mar 11, 2010 4:08:19 AM org.apache.catalina.cluster.tcp.SimpleTcpCluster 
memberAdded
INFO: Replication member 
added:org.apache.catalina.cluster.mcast.McastMember[tcp://XXX.XXX.XXX.XXX:4001,catalina,XXX.XXX.XXX.XXX,4001,
 alive=125035743]

This has been gonig on for quite some time now, and happens semi-consistently. 
Sometimes once every 10-30 minutes and sometimes 6-10 times in succession 
within a few seconds. We don't think session replication is technically 
failing, but rather "resetting" itself after the mcast fails. We had the 
clustering setup in tomcat 5.5.27 to be fairly standard:

        <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
                 
managerClassName="org.apache.catalina.cluster.session.DeltaManager"
                 expireSessionsOnShutdown="false"
                 useDirtyFlag="true">

            <Membership 
                className="org.apache.catalina.cluster.mcast.McastService"
                mcastAddr="X.X.X.X"
                mcastPort="45564"
                mcastFrequency="500"
                mcastDropTime="3000"
                mcastTTL="1"/>

            <Receiver 
                className="org.apache.catalina.cluster.tcp.ReplicationListener"
                tcpListenAddress="auto"
                tcpListenPort="4001"
                tcpSelectorTimeout="100"
                tcpThreadCount="6"/>

            <Sender
                
className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
                replicationMode="synchronous"/>

            <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve"
                   
filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>
                   
            <Deployer 
className="org.apache.catalina.cluster.deploy.FarmWarDeployer"
                      tempDir="/tmp/war-temp/"
                      deployDir="/tmp/war-deploy/"
                      watchDir="/tmp/war-listen/"
                      watchEnabled="false"/>
                      
            <ClusterListener 
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
        </Cluster>

Also noticed was another odd set of logs:

Mar 12, 2010 2:36:45 PM org.apache.catalina.cluster.tcp.TcpReplicationThread 
sendAck
WARNING: Unable to send ACK back through channel, channel disconnected?: 
Connection reset by peer
Mar 12, 2010 2:36:45 PM org.apache.catalina.cluster.tcp.TcpReplicationThread 
sendAck
WARNING: Unable to send ACK back through channel, channel disconnected?: Broken 
pipe

This would happen maybe once or twice a day. This one is a bit trickier to 
figure out the cause of.

The conclusion was that the mcast timeouts were most likely a little low, and 
that since there were no other errors... come Monday we'd just up the timeout 
values to:

                mcastFrequency="2000"
                mcastDropTime="10000"

But this morning we also noticed (before changing these timeout values) that 
the logging had changed a bit. The "memberDisappeared" & "memberAdded" logs 
seemed to slide off and were replaced with another type of log which was a bit 
more consistent (happening upwards of 10 minute intervals) This error makes me 
somewhat concerned:

Mar 15, 2010 3:59:43 AM org.apache.catalina.cluster.tcp.ReplicationValve 
sendReplicationMessage
SEVERE: Unable to perform replication request.
java.util.ConcurrentModificationException
        at java.util.ArrayList.writeObject(ArrayList.java:573)
        at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:945)
        at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1461)
        at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150)
        at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1509)
        at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1474)
        at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1392)
        at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1150)
        at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:326)
        at 
org.apache.catalina.cluster.session.DeltaRequest$AttributeInfo.writeExternal(DeltaRequest.java:352)
        at 
org.apache.catalina.cluster.session.DeltaRequest.writeExternal(DeltaRequest.java:269)
        at 
org.apache.catalina.cluster.session.DeltaManager.unloadDeltaRequest(DeltaManager.java:743)
        at 
org.apache.catalina.cluster.session.DeltaManager.requestCompleted(DeltaManager.java:1320)
        at 
org.apache.catalina.cluster.tcp.ReplicationValve.send(ReplicationValve.java:549)
        at 
org.apache.catalina.cluster.tcp.ReplicationValve.sendMessage(ReplicationValve.java:536)
        at 
org.apache.catalina.cluster.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:518)
        at 
org.apache.catalina.cluster.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:428)
        at 
org.apache.catalina.cluster.tcp.ReplicationValve.invoke(ReplicationValve.java:362)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
        at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:200)
        at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:283)
        at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:773)
        at 
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
        at 
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:895)
        at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
        at java.lang.Thread.run(Thread.java:619)

After this message, we usually get a replication stats message within a minute 
later:

ie.
Mar 15, 2010 1:48:47 PM org.apache.catalina.cluster.tcp.DataSender addStats
INFO: Send stats from [XXX.XXX.XXX.XXX:4,001], Nr of bytes sent=52,364,561 over 
14,000 = 3,740 bytes/request, processing time 0 msec, avg processing time 0 msec

Which I assume tells us that replication is up and going along fine.

We've upped the mcast timeout values on our servers, but are concerned that 
replication is actually failing now. Not sure why this started this morning. 
Going to do more network oriented research on this for certain.

Ideas:

I believe we should not encounter further issue with member connect/disconnect 
due to the timeout value mod. We are assuming that we do not see similar 
logging in our test env due to the lack of traffic on the test subnet vs a much 
higher traffic volume in our production subnet, thus possibly affecting 
mcast-ing. 

Aside from going to tomcat 6.x to achieve static membership (as opposed to 
mcast membership) I'm not sure what else could be done to possibly resolve the 
"SEVERE: Unable to perform replication request." logs we're getting. I also 
haven't been unable to determine the severity of this message either.

Any ideas or information?

Raymond Walker
Software Systems Engineer Sr.
ITS Northern Arizona University


-- 
You are currently subscribed to [email protected] as: 
[email protected]
To unsubscribe, change settings or access archives, see 
http://www.ja-sig.org/wiki/display/JSG/cas-user

Reply via email to