Re: Frequent SEVERE: Unable to receive message through TCP channel messages
we are getting the GC printed on to the same catalina.out and we see that the memberAdded messages appear almost at the time of GB getting printed, does it prove that longer GC pauses are causing this? is there any other data points/proof can be get? rearding network problems, we are requesting the network to capture the multicast traffic between these nodes, is there anything you suggest us to do? regaring increase the membership timeout we plan to increase this to 5 minutes, do you have any other suggestions. tomcat startup takes almost 70 seconds ( it hosts almost 32 apps) and all of them are clustered. regards, Rainer Jung-3 wrote: You configured a 3 seconds timeout for your heartbeat. If a node doesn't receive a heartbeat packet for 3 seconds, it assumes the other node is dead and closes the incoming replication connection. If the other node is not really dead, it will try to use this replication connection which will not work any more. Why could this happen: one possible reason are GC pauses. If you've got longer GC pauses, than your membership heartbeat timeout, then you run into such problems. During normal operations you should not observe any memberDisappeared messages. They should only show up, ehen you stop a node or it crashes, or you've got serious network problems with impact on the multicast heartbeat packets. If you decide to increase the membership timeout (which sounds like a good idea), keep in mind, that you need to wait the given time between stopping and restarting a node. Regards, Rainer nageshsrao wrote: Hi, In our prod environment we have two tomcat's [ 5.0.27] running on two linux boxes [ RHAS 3.0 update8 ] and using mod_jk2.0 thru apache for accessing the information. very frequently we see the following messages in the catalina.out and there are about 2 instances where tomcat stopped responding and we had to restart. the only errors that we see are the following.. There are INFO which keeps telling us member is disappeared and added and once in a while we have SEVERE messages. Could you let us know, what could be causing this problem? is there any additional configuration that are needed?, This environment is running for almost 18 months in production and off-late [ in the last 6 months] we have seen this happenned twice. I have attached both the error log found in the catalina.out and also the server.xml from both the tomcat. http://www.nabble.com/file/p12142134/catalina-error.out catalina-error.out http://www.nabble.com/file/p12142134/server-app1.xml server-app1.xml http://www.nabble.com/file/p12142134/server-app2.xml server-app2.xml - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- View this message in context: http://www.nabble.com/Frequent-%22SEVERE%3A-Unable-to-receive-message-through-TCP-channel%22-messages-tf4266454.html#a12176135 Sent from the Tomcat - User mailing list archive at Nabble.com. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Frequent SEVERE: Unable to receive message through TCP channel messages
nageshsrao wrote: we are getting the GC printed on to the same catalina.out and we see that the memberAdded messages appear almost at the time of GB getting printed, does it prove that longer GC pauses are causing this? is there any other data points/proof can be get? E.g. -XX:+PrintGCApplicationStoppedTime rearding network problems, we are requesting the network to capture the multicast traffic between these nodes, is there anything you suggest us to do? If you are doing the multicast only inside a subnet, the usual basic network monitoring should be sufficient. But often during phases were you have problems that might be network related it is good to keep in touch with the network people in order to discuss, if they know about any general network problems. If you do multicasting crossing the borders of subnets, the network needs to use multicast group membership protocols, which involves complicated configuration of routers. Most users though don't need to cross subnets. regaring increase the membership timeout we plan to increase this to 5 minutes, do you have any other suggestions. tomcat startup takes almost 70 seconds ( it hosts almost 32 apps) and all of them are clustered. I would expect, that your GC even with a big heap won't take longer than 20 seconds. Most likely it's much less. On the other hand if you go to 5 minutes, you would always need to wait 5 minutes between shutting down one node and starting it up again. It seems unreasonable to me, that IT staff will obey that. I would suggest 30 seconds and a clear message in the startup script, to remember people using it, that they have to wait 30 seconds after stopping and before starting again. Regards, Rainer - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Frequent SEVERE: Unable to receive message through TCP channel messages
You configured a 3 seconds timeout for your heartbeat. If a node doesn't receive a heartbeat packet for 3 seconds, it assumes the other node is dead and closes the incoming replication connection. If the other node is not really dead, it will try to use this replication connection which will not work any more. Why could this happen: one possible reason are GC pauses. If you've got longer GC pauses, than your membership heartbeat timeout, then you run into such problems. During normal operations you should not observe any memberDisappeared messages. They should only show up, ehen you stop a node or it crashes, or you've got serious network problems with impact on the multicast heartbeat packets. If you decide to increase the membership timeout (which sounds like a good idea), keep in mind, that you need to wait the given time between stopping and restarting a node. Regards, Rainer nageshsrao wrote: Hi, In our prod environment we have two tomcat's [ 5.0.27] running on two linux boxes [ RHAS 3.0 update8 ] and using mod_jk2.0 thru apache for accessing the information. very frequently we see the following messages in the catalina.out and there are about 2 instances where tomcat stopped responding and we had to restart. the only errors that we see are the following.. There are INFO which keeps telling us member is disappeared and added and once in a while we have SEVERE messages. Could you let us know, what could be causing this problem? is there any additional configuration that are needed?, This environment is running for almost 18 months in production and off-late [ in the last 6 months] we have seen this happenned twice. I have attached both the error log found in the catalina.out and also the server.xml from both the tomcat. http://www.nabble.com/file/p12142134/catalina-error.out catalina-error.out http://www.nabble.com/file/p12142134/server-app1.xml server-app1.xml http://www.nabble.com/file/p12142134/server-app2.xml server-app2.xml - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]