[
https://issues.apache.org/jira/browse/QPID-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ken Giusti resolved QPID-5773.
------------------------------
Resolution: Fixed
Fix Version/s: (was: Future)
0.29
> Qpid Protocol Negotiation Sometime Fails with Python Qpid with SSL
> ------------------------------------------------------------------
>
> Key: QPID-5773
> URL: https://issues.apache.org/jira/browse/QPID-5773
> Project: Qpid
> Issue Type: Bug
> Components: Python Client
> Affects Versions: 0.24, 0.26
> Environment: Python QPID 0.24 or 0.26 Client, C++ Broker (same
> version)
> Python 2.6 (RHEL 6.5)
> Eventlet Monkey Patched for all but OS
> More than 15 concurrent connections writing messages
> Using OpenStack Oslo QPID Driver to send Messages
> Reporter: Brent Tang
> Assignee: Ken Giusti
> Fix For: 0.29
>
> Attachments: transports.py
>
>
> When running an application using OpenStack (using the QPID driver for
> Messaging) at large scales, we have have found that connections are aborted
> shortly after they are established. This is because the Broker closes the
> connection after the Max Negotiation time because it does not believe the
> Protocol Negotiation has completed (no matter how large the Max Negotiation
> time is set to).
> To reproduce this issue we created a test program that just starts up 50
> threads and a connection pool size of 20 and have each thread just writing
> messages over and over to simulate the large number of concurrent writes,
> which reproduced this issue every time after the number of connections got
> above 13 or 14.
> In going back to the 0.22 Python QPID Client (with any version of the
> broker), we found this issue did not occur, but if we use the 0.24 or 0.26
> version of the Python Client, it occurred every time. In tracing this back
> through trial an error it came down to the change in the recv method in the
> transports.py module.
> We are still not sure what the underlying root cause of it is but have
> isolated it down to the usage of the "recv_into" method on the ssl socket
> when the event let monkey patching is done (so that the eventlet code is in
> the middle). When the monkey patching is taken out this works fine, or if
> the read method is called instead it works fine.
> With the change made in QPID-4872 there are benefits to doing the retry on
> the write/send buffer, but the current implementation for the recv isn't
> really helping to pass the buffer down to the OpenSSL code since the ssl
> socket recv_into just does a read anyway.
> So irrespective of what the actual issue with with using the recv_into with
> the eventlet code in the middle, it seems like this can be changed back to a
> read to solve the problem.
> The proposal here is to either change the transports.py recv back to what it
> was in the 0.22 version, or to do something like this (which will minimize
> the risk of the change because it would still be doing the retry on the same
> number of bytes from the previous failed call like the 0.24 / 0.26 versions
> are doing). (Note: This change was done to minimize the number of lines
> changed to show the difference, so the variable names may not be the most
> relevant now):
> def recv(self, n):
> ....if self.read_retry == None:
> ........self.read_retry = n
> ....self._clear_state()
> ....try:
> ........r = self.tls.read( self.read_retry )
> ........self.read_retry = None
> ........return r
> ....except SSLError, e:
> ........if self._update_state(e.args[0]):
> ............# will retry on next invocation
> ............return None
> ........self.read_retry = None
> ........raise
> ....except:
> ........self.read_retry = None
> ........raise
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]