Brent Tang created QPID-5773:
--------------------------------
Summary: Qpid Protocol Negotiation Sometime Fails with Python Qpid
with SSL
Key: QPID-5773
URL: https://issues.apache.org/jira/browse/QPID-5773
Project: Qpid
Issue Type: Bug
Components: Python Client
Affects Versions: 0.26, 0.24
Environment: Python QPID 0.24 or 0.26 Client, C++ Broker (same version)
Python 2.6 (RHEL 6.5)
Eventlet Monkey Patched for all but OS
More than 15 concurrent connections writing messages
Using OpenStack Oslo QPID Driver to send Messages
Reporter: Brent Tang
Fix For: Future
When running an application using OpenStack (using the QPID driver for
Messaging) at large scales, we have have found that connections are aborted
shortly after they are established. This is because the Broker closes the
connection after the Max Negotiation time because it does not believe the
Protocol Negotiation has completed (no matter how large the Max Negotiation
time is set to).
To reproduce this issue we created a test program that just starts up 50
threads and a connection pool size of 20 and have each thread just writing
messages over and over to simulate the large number of concurrent writes, which
reproduced this issue every time after the number of connections got above 13
or 14.
In going back to the 0.22 Python QPID Client (with any version of the broker),
we found this issue did not occur, but if we use the 0.24 or 0.26 version of
the Python Client, it occurred every time. In tracing this back through trial
an error it came down to the change in the recv method in the transports.py
module.
We are still not sure what the underlying root cause of it is but have isolated
it down to the usage of the "recv_into" method on the ssl socket when the event
let monkey patching is done (so that the eventlet code is in the middle). When
the monkey patching is taken out this works fine, or if the read method is
called instead it works fine.
With the change made in QPID-4872 there are benefits to doing the retry on the
write/send buffer, but the current implementation for the recv isn't really
helping to pass the buffer down to the OpenSSL code since the ssl socket
recv_into just does a read anyway.
So irrespective of what the actual issue with with using the recv_into with the
eventlet code in the middle, it seems like this can be changed back to a read
to solve the problem.
The proposal here is to either change the transports.py recv back to what it
was in the 0.22 version, or to do something like this (which will minimize the
risk of the change because it would still be doing the retry on the same number
of bytes from the previous failed call like the 0.24 / 0.26 versions are
doing). (Note: This change was done to minimize the number of lines changed to
show the difference, so the variable names may not be the most relevant now):
def recv(self, n):
if self.read_retry == None:
self.read_retry = n
self._clear_state()
try:
r = self.tls.read( self.read_retry )
self.read_retry = None
return r
except SSLError, e:
if self._update_state(e.args[0]):
# will retry on next invokation
return None
self.read_retry = None
raise
except:
self.read_retry = None
raise
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]