> On May 21, 2015, 8:51 p.m., Alan Conway wrote: > > There is something more going on here. If the client is re-connecting then > > it should be re-connecting on an entirely new connection so sessions should > > not be able to clash, the new connection should have no session on it. > > > > It sounds to me like somehow the client is re-using the original connection > > after you unblock it at the firewall, which is definitely not right - there > > could be all kinds of invalid state in that connections sessions. If the > > client decides a connection is faulty it should definitively close it and > > forget it before re-connecting and re-establishing sessions. You need to > > track down how/why it is managing to use the old connection after it has > > failed. > > Gordon Sim wrote: > 'If the client is re-connecting then it should be re-connecting on an > entirely new connection so sessions should not be able to clash' - this is > true in the sense that it is a new connection as far as the broker is > concerned, however the client will keep using the qpid.message.Connection and > will reattach all the corresponding sessions. If the broker hasn't determined > that the old connection is now dead, it won't have closed the old sessions, > which could indeed result in a naming clash as the text above hypothesises.
Ah yes - because 0-10 session names are not scoped to the connection, I had forgotten that. So we either have to rename the session on the client or force the broker to allow the new session. 0-10 has a "force" flag on attach for exactly that but looking at the broker/SessionManager code I see "// FIXME aconway 2008-04-29: implement force " :( It probably wouldn't be hard to implement though. - Alan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34560/#review84786 ----------------------------------------------------------- On May 22, 2015, 4:52 p.m., Ernie Allen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34560/ > ----------------------------------------------------------- > > (Updated May 22, 2015, 4:52 p.m.) > > > Review request for qpid, Alan Conway and Kenneth Giusti. > > > Repository: qpid > > > Description > ------- > > Calling receiver.fetch(timeout=10) in a loop, when network drops packages for > a while causes uncaught exception KeyError in python-qpid-0.22. It causes on > semi-infinite recursion on python-qpid-0.30. > > The recursion problem was solved independently. > > The attached patch does two things: > 1) session.close() checks to see if the session is already closed. If so, it > just returns. This prevents an exception from being displayed when the > session is already closed. > 2) In driver.py, if we get a do_session_detached() event, check to see if the > channel is in our list of sessions before using it. If it isn't, close the > session. > > Here is my estimation on what is happening when the network drops: > - The driver detects the socket error, closes the engine and goes into its > retry loop. > - Once the network comes back, the engine is restarted and all the sessions > on the connection are re-attached. > - However, the broker sees the attempt to attach using a channel that it > thinks is already attached. > - The broker logs the following: 2015-05-21 14:51:35 [Broker] error Channel > exception: session-busy: Session already attached: > anonymous.5c6f079c-571e-46f8-8ce6-72997da200a3:0 > (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/broker/SessionManager.cpp:55) > 2015-05-21 14:51:35 [Broker] error Channel exception: not-attached: Channel 0 > is not attached > (/home/eallen/workspace/32/rh-qpid/qpid/cpp/src/qpid/amqp_0_10/SessionHandler.cpp:39) > - This results in a do_session_detached() event in the engine. > - However, since the engine was closed when the socket error occurred and > reopened when it cleared, it doesn't know about the old session. > > If I test to see if the channel number being detached is associated with a > session, and just return, then the client is hung. So.. when I see an event > to detach an unknown session, I'm closing the engine and raising a > ConnectionError back to the client. > > Ideally the driver/engine would recover, but I don't see how we can get the > broker and client back into agreement. > > > Diffs > ----- > > trunk/qpid/python/qpid/messaging/driver.py 1680941 > > Diff: https://reviews.apache.org/r/34560/diff/ > > > Testing > ------- > > 1. Run this script against a qpidd broker: > #!/usr/bin/env python > from qpid.messaging import * > import datetime > > conn = Connection("localhost:5672", reconnect=10) > timeout=10 > > try: > conn.open() > sess = conn.session() > > recv = sess.receiver("testQueue;{create:always}") > > while (1): > print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout) > msg = Message() > try: > msg = recv.fetch(timeout=timeout) > except ReceiverError, e: > print e > except ConnectError, e: > print "ConnectError", str(e) > break > print "%s: after fetch, msg=%s" (datetime.datetime.now(), msg) > > print "about to close session" > sess.close() > > except ReceiverError, e: > print e > except KeyboardInterrupt: > pass > > print "about to close connection" > conn.close() > > 2. Simulate network outage: > iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date > > 3. Once python script writes "No handlers could be found for logger > "qpid.messaging"", flush iptables (iptables -F) > > 4. Wait up to 10 seconds > > The ConnectError is received by the client and the loop can be exited. > > > Thanks, > > Ernie Allen > >
