Re: Frequent SEVERE: Unable to receive message through TCP channel messages

2007-08-16 Thread nageshsrao

we are getting the GC printed on to the same catalina.out and we see that the
memberAdded messages appear almost at the time of GB getting printed, does
it prove that longer GC pauses are causing this? is there any other data
points/proof can be get?

rearding network problems, we are requesting the network to capture the
multicast traffic between these nodes, is there anything you suggest us to
do?

regaring increase the membership timeout we plan to increase this to 5
minutes, do you have any other suggestions.  tomcat startup takes almost 70
seconds ( it hosts almost 32 apps) and all of them are clustered.

regards,


Rainer Jung-3 wrote:
 
 You configured a 3 seconds timeout for your heartbeat. If a node doesn't 
 receive a heartbeat packet for 3 seconds, it assumes the other node is 
 dead and closes the incoming replication connection. If the other node 
 is not really dead, it will try to use this replication connection which 
 will not work any more.
 
 Why could this happen: one possible reason are GC pauses. If you've got 
 longer GC pauses, than your membership heartbeat timeout, then you run 
 into such problems.
 
 During normal operations you should not observe any memberDisappeared 
 messages. They should only show up, ehen you stop a node or it crashes, 
 or you've got serious network problems with impact on the multicast 
 heartbeat packets.
 
 If you decide to increase the membership timeout (which sounds like a 
 good idea), keep in mind, that you need to wait the given time between 
 stopping and restarting a node.
 
 Regards,
 
 Rainer
 
 nageshsrao wrote:
 Hi,
 
 In our prod environment we have two tomcat's [ 5.0.27]  running on two
 linux
 boxes [ RHAS 3.0 update8 ] and using mod_jk2.0 thru apache for accessing
 the
 information. 
 
 very frequently we see the following messages in the catalina.out and
 there
 are about 2 instances where tomcat stopped responding and we had to
 restart.
 the only errors that we see are the following.. There are INFO which
 keeps
 telling us member is disappeared and added and once in a while we have
 SEVERE messages.
 
 Could you let us know, what could be causing this problem? is there any
 additional configuration that are needed?,  This environment is running
 for
 almost 18 months in production and off-late [ in the last 6 months] we
 have
 seen this happenned twice.  I have attached both the error log found in
 the
 catalina.out and also the server.xml from both the tomcat.
 
 
 http://www.nabble.com/file/p12142134/catalina-error.out
 catalina-error.out 
 http://www.nabble.com/file/p12142134/server-app1.xml server-app1.xml 
 http://www.nabble.com/file/p12142134/server-app2.xml server-app2.xml 
 
 -
 To start a new topic, e-mail: users@tomcat.apache.org
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Frequent-%22SEVERE%3A-Unable-to-receive-message-through-TCP-channel%22-messages-tf4266454.html#a12176135
Sent from the Tomcat - User mailing list archive at Nabble.com.


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Frequent SEVERE: Unable to receive message through TCP channel messages

2007-08-16 Thread Rainer Jung

nageshsrao wrote:

we are getting the GC printed on to the same catalina.out and we see that the
memberAdded messages appear almost at the time of GB getting printed, does
it prove that longer GC pauses are causing this? is there any other data
points/proof can be get?


E.g. -XX:+PrintGCApplicationStoppedTime


rearding network problems, we are requesting the network to capture the
multicast traffic between these nodes, is there anything you suggest us to
do?


If you are doing the multicast only inside a subnet, the usual basic 
network monitoring should be sufficient. But often during phases were 
you have problems that might be network related it is good to keep in 
touch with the network people in order to discuss, if they know about 
any general network problems.


If you do multicasting crossing the borders of subnets, the network 
needs to use multicast group membership protocols, which involves 
complicated configuration of routers. Most users though don't need to 
cross subnets.



regaring increase the membership timeout we plan to increase this to 5
minutes, do you have any other suggestions.  tomcat startup takes almost 70
seconds ( it hosts almost 32 apps) and all of them are clustered.


I would expect, that your GC even with a big heap won't take longer than 
20 seconds. Most likely it's much less. On the other hand if you go to 5 
minutes, you would always need to wait 5 minutes between shutting down 
one node and starting it up again. It seems unreasonable to me, that IT 
staff will obey that. I would suggest 30 seconds and a clear message in 
the startup script, to remember people using it, that they have to wait 
30 seconds after stopping and before starting again.


Regards,

Rainer

-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Frequent SEVERE: Unable to receive message through TCP channel messages

2007-08-14 Thread Rainer Jung
You configured a 3 seconds timeout for your heartbeat. If a node doesn't 
receive a heartbeat packet for 3 seconds, it assumes the other node is 
dead and closes the incoming replication connection. If the other node 
is not really dead, it will try to use this replication connection which 
will not work any more.


Why could this happen: one possible reason are GC pauses. If you've got 
longer GC pauses, than your membership heartbeat timeout, then you run 
into such problems.


During normal operations you should not observe any memberDisappeared 
messages. They should only show up, ehen you stop a node or it crashes, 
or you've got serious network problems with impact on the multicast 
heartbeat packets.


If you decide to increase the membership timeout (which sounds like a 
good idea), keep in mind, that you need to wait the given time between 
stopping and restarting a node.


Regards,

Rainer

nageshsrao wrote:

Hi,

In our prod environment we have two tomcat's [ 5.0.27]  running on two linux
boxes [ RHAS 3.0 update8 ] and using mod_jk2.0 thru apache for accessing the
information. 


very frequently we see the following messages in the catalina.out and there
are about 2 instances where tomcat stopped responding and we had to restart.
the only errors that we see are the following.. There are INFO which keeps
telling us member is disappeared and added and once in a while we have
SEVERE messages.

Could you let us know, what could be causing this problem? is there any
additional configuration that are needed?,  This environment is running for
almost 18 months in production and off-late [ in the last 6 months] we have
seen this happenned twice.  I have attached both the error log found in the
catalina.out and also the server.xml from both the tomcat.


http://www.nabble.com/file/p12142134/catalina-error.out catalina-error.out 
http://www.nabble.com/file/p12142134/server-app1.xml server-app1.xml 
http://www.nabble.com/file/p12142134/server-app2.xml server-app2.xml 


-
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]