On 11/03/2009 06:13 AM, Shan Wang wrote:
Hi All,

We have two qpid 0.5 brokers running in cluster mode on two different boxes. 
The cluster works fine in normal cases, ie, if broker1 is shutdown cleanly, 
broker2 will keep on serving clients. But today we found one broker suddenly 
lost response to all connected clients and admin tools. All producer and 
consumer clients are still connected but failed to consume any messages from 
the queue. The command line admin tool failed with a time out error. The only 
error message we found is in the log of broker 1, which said this:

2009-oct-31 10:17:49 error 172.27.34.201:9908(READY/error) channel error 
157487219 on 172.27.34.201:9908-389(local): transport-busy: Channel 1 already 
attached to [email protected]
64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150) (unresolved: 
172.27.34.201:9908 172.27.34.202:13287 )

After only restarted broker 1, everything starts to work again. So surprisingly 
it seems when one of the brokers in the cluster suffered a problem, the whole 
cluster just stalled, at least from the consumer's point of view ( I can't be 
sure if the producer was working during the down time, after back to normal, 
consumer did receive messages sent sometime ago ). Consumer program uses 
FailoverManager and AsyncSession, basically not far from the failover example 
in the qpid developing doc. So can anyone please tell me what the above error 
message means and have we seen similar problems to the cluster before?


There have been a number of cluster bugs fixed since 0.5, some of which had the symptom of a "transport-busy" exception. Can you try a trunk build and see if you have the same problems?

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to