Alan Conway wrote:
On 11/03/2009 06:13 AM, Shan Wang wrote:
Hi All,
We have two qpid 0.5 brokers running in cluster mode on two different
boxes. The cluster works fine in normal cases, ie, if broker1 is
shutdown cleanly, broker2 will keep on serving clients. But today we
found one broker suddenly lost response to all connected clients and
admin tools. All producer and consumer clients are still connected
but failed to consume any messages from the queue. The command line
admin tool failed with a time out error. The only error message we
found is in the log of broker 1, which said this:
2009-oct-31 10:17:49 error 172.27.34.201:9908(READY/error) channel
error 157487219 on 172.27.34.201:9908-389(local): transport-busy:
Channel 1 already attached to [email protected]
64-4e49-9bee-0538532fe261 (qpid/amqp_0_10/SessionHandler.cpp:150)
(unresolved: 172.27.34.201:9908 172.27.34.202:13287 )
After only restarted broker 1, everything starts to work again. So
surprisingly it seems when one of the brokers in the cluster suffered
a problem, the whole cluster just stalled, at least from the
consumer's point of view ( I can't be sure if the producer was
working during the down time, after back to normal, consumer did
receive messages sent sometime ago ). Consumer program uses
FailoverManager and AsyncSession, basically not far from the failover
example in the qpid developing doc. So can anyone please tell me what
the above error message means and have we seen similar problems to
the cluster before?
There have been a number of cluster bugs fixed since 0.5, some of
which had the symptom of a "transport-busy" exception. Can you try a
trunk build and see if you have the same problems?
or what distro and version of qpid are you running?
Carl.
---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:[email protected]