Re: Strange SSL_shutdown() error return (SSL_ERROR_SYSCALL but errno == 0)
Antoine Pitrou wrote: Well, in our case, and unless I'm mistaken, ret == -1, ERR_get_error() == 0 and then errno (the Unix errno) == 0. SSL_shutdown() by virtue of its unique mechanic you will not see ret == 0 (in the way the SSL_get_error man page describes) since that has a different and special meaning. It means the first point that ((SSL_get_shutdown() SSL_SENT_SHUTDOWN) == SSL_SENT_SHUTDOWN) would be true. Unlike for example SSL_read() which can return 0, which does mean EOF. For which you can then do ((SSL_get_shutdown() SSL_RECEIVED_SHUTDOWN) == SSL_RECEIVED_SHUTDOWN) to find out if it was a secure EOF. === RANT MODE If the OpenSSL SSL_shutdown() API could have been made better this is certainly one area that could be better. i.e. make SSL_shutdown() return the current state like SSL_get_shutdown() does (which means non-zero states). Then reuse the return of 0 state to mean EOF on transport and keep -1/WANT_READ/WANT_WRITE/ERROR_SYSCALL as-is. This would mean (simplified understanding) : * old version returned 0, new version returns 1 (SSL_SENT_SHUTDOWN). * old version returned 1, new version returns 3 (SSL_SENT_SHUTDOWN|SSL_RECEIVED_SHUTDOWN). Unfortunately this would have broken historical compatibility; it took quite a while to get the minimum breakage patch in to achieve my goals by the end of that time thinking about improving OpenSSL (rather than bug fixing it) was long out of my mind. I'm all for breaking APIs to make things better, providing its done in a responsible way. A poorly thought out API call can't hog a popular API symbol forever, otherwise the whole product starts to weaken. === RANT MODE Perhaps errno gets cleared by another operation... I may try to investigate if I get some time. Well now I've looked at the Python Module/_ssl.c to understand the context of your usage, you are using standard stuff for BIO. I know that errno==0 is getting set by OpenSSL before it makes the read() system call (openssl-1.0.0/crypto/bio/bss_fd.c:150 function fd_read() calls clear_sys_error() which does errno=0; from openssl-1.0.0/e_os.h). Then (I presume) it gets a read()==0 from kernel (bss_fd.c:151). Of course a read()==0 does not modify errno in libc. So in openssl-1.0.0/ssl/s3_lib.c:3191 inside the SSL_shutdown() implementation you can see the error return is ignored. Since returning 0 from here has a different documented meaning. I think this is the sequence of events you observe. Unfortunately I can't confirm it to be so since I can't get the test cases to run from Python's SVN. Darryl __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: Strange SSL_shutdown() error return (SSL_ERROR_SYSCALL but errno == 0)
Antoine Pitrou wrote: What I'm specifically interested in is SSL_ERROR_SYSCALL with errno==0. I have investigated this issue of -1/SSL_ERROR_SYSCALL with errno==0. From the SSL_get_error(3) man page: SSL_ERROR_SYSCALL Some I/O error occurred. The OpenSSL error queue may contain more information on the error. If the error queue is empty (i.e. ERR_get_error() returns 0), ret can be used to find out more about the error: If ret == 0, an EOF was observed that violates the protocol. If ret == -1, the underlying BIO reported an I/O error (for socket I/O on Unix systems, consult errno for details). Note the use of may contain more information there is no guarantee. Note the confirmation that ret==0 for the specific condition of EOF (on the BIO, i.e. on the socket, it violates the protocol because the protocol expects to receive a shutdown notify packet, which would have been caused by the far end calling SSL_shutdown() at least once). You have used the term errno where really OpenSSL talks in terms of the error codes off the error stack. I also note the man page doesn't include SSL_shutdown() in the very specific list of calls that SSL_get_error() is used in sympathy with. However it was my intention to bring SSL_shutdown() into line so that man page should also be updated to include SSL_shutdown(). My claim is that the other end did a close() on the socket, while you were trying/sending/waiting-for the two-way SSL shutdown process to complete. This would be observed as an end-of-file condition, i.e. read() returns 0. This is considered a SSL3/TLS1 protocol violation because the protocol expects all users to always make use of the cryptographically secure two-stream shutdown all the time. I have then taken a look at Python from CVS and see that: ./Modules/_ssl.c function PySSL_SetError() does attempt to handle SSL_ERROR_SYSCALL as per the documentation. Whoever wrote that did read the man page. While I agree with the sentiment that having the exact errno saved and available for inspection/recall by the application using OpenSSL would be very useful. I don't agree that SSL_shutdown() is acting against the existing documentation. Unfortunately I am not sure myself how errno values from read/write or recv/send calls get onto the OpenSSL error stack. Auditing the source reveals very few places where get_last_socket_error() is called in relation to normal recv/send IO operations. So I'm almost able to say it is not possible to retrieve errno values for anything other than the connect setup phase (where a variety of kinds of error can occur, ECONNREFUSED, ETIMEDOUT, ENETUNREACH, ... check out the connect(2) man page). This is also the stage most people have problems and therefore historically have required the most detail about the problem to resolve it. There is however one mystery of how EPIPE from a write() is getting propagated back in Python, I can only think that Python's custom BIO is providing this information. As I can't see how OpenSSL's own socket BIO implementation does that. The strangeness of printing errno==0 out as a reason for an error is actually leaning towards a facet of Python's BIO layer and BIO error handling. Darryl __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: Strange SSL_shutdown() error return (SSL_ERROR_SYSCALL but errno == 0)
Hello again, I have investigated this issue of -1/SSL_ERROR_SYSCALL with errno==0. From the SSL_get_error(3) man page: SSL_ERROR_SYSCALL Some I/O error occurred. The OpenSSL error queue may contain more information on the error. If the error queue is empty (i.e. ERR_get_error() returns 0), ret can be used to find out more about the error: If ret == 0, an EOF was observed that violates the protocol. If ret == -1, the underlying BIO reported an I/O error (for socket I/O on Unix systems, consult errno for details). Well, in our case, and unless I'm mistaken, ret == -1, ERR_get_error() == 0 and then errno (the Unix errno) == 0. Perhaps errno gets cleared by another operation... I may try to investigate if I get some time. Regards Antoine. __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: Strange SSL_shutdown() error return (SSL_ERROR_SYSCALL but errno == 0)
Antoine Pitrou wrote: These issues are tracked together at http://bugs.python.org/issue8108 , because they both appeared when someone tried OpenSSL 0.9.8m. I have read through the discussion first I'd like to confirm the scenario for the errno==0 situation through particular sequence of events. I have an SSL protocol test-case creator that can manipulate both ends OpenSSL API usage in a co-ordinated fashion, it should be straight forward to cause an abrupt socket closure around/during SSL_shutdown() usage. Ok, thanks for the clarification. We were a bit baffled by errno==0 (EPIPE, ECONNABORTED, EBADF... would have been much more helpful). I agree with this, it should return a more useful value. So, in any case, I can interpret an SSL_ERROR_SYSCALL return from SSL_shutdown() as the socket was closed more or less abruptly response? There are no other possible reasons for this error return? This is the intention of the error indication. The presumption by me at this time is to believe it, as no proof has been submitted otherwise. Further investigation may alter this statement. But to be a good well meaning TLS/SSL citizen both ends should continue their non-blocking event loops for a reasonable amount of time (in the order of 5 to TCP timeout seconds) even after the last SSL_write() has been made. He, well. The interesting thing here is that we are testing a blocking FTP TLS client with a non-blocking (event loop-based) server. The blocking client can't really sleep() for 5 seconds when closing the FTP session. At least I think users wouldn't like it :-) Also, the client doesn't try to shutdown the SSL layer when closing its connection. According to the client's author, this is contrary to the RFC. In his own words: This is in sympathy with my claim. To reiterate, it is upto an individual protocol/application to decide if it requires a secure cryptographic shutdown or not. It is also upto the individual protocol/application to decide the course of action to take when it doesn't happen. So if the protocol spec for FTP TLS makes a claim one way or the other, that is a matter for that specification. Since the FTP protocol has a clear QUIT command to mark the moment when the client has no further use of the control connection, then there is actually no need to perform a full SSL_shutdown() to make the system safe from attack. This doesn't mean you shouldn't attempt to do SSL_shutdown(). ftplib.FTP_TLS class already calls unwrap() but only when closing a secured *data* connection. This is never done for the *control* connection as the examples shown in RFC-4217 do that only when dealing with the CCC command which is intended to switch the control connection back to clear text. Since ftplib.py does not implement the CCC command I would avoid to override its close() method. You need to be clear in your own mind what statements from the FTP TLS specification are: * mandating and * what it is suggesting / recommending and * also matters it doesn't indicate any opinion on The fact that something ISN'T shown in an example should not be taken as any kind of statement, it is just that; that specific example didn't express that particular matter. Interpret only the rules that are written as rules, anything else is open to interpretation. You also need to go an read the original RFC first-hand and come to your own interpretation. Then compare your interpretation to that of the ftplib author's. (if you have an opinion on this specific point -- no implicit SSL shutdown when closing the FTP session --, I'd like to hear it. Although it isn't really part of the issue at hand). You'd need to educate me in the specific of FTP TLS protocol. I am very experienced with all the details of the classic FTP protocol. Does FTP TLS : * does it make use of 2 sockets like FTP ? * are both sockets encrypted with TLS (at all times before any transaction starts) ? * is the ftp-data socket opened/closed once for each file like FTP ? * is the payload data inside the ftp-data socket just the exact number of bytes in the single file being transfered ? So in interests of trying to convey better understanding of the TLS shutdown issue please read the following claims and attempt to understand the goals behind each claim rather than the specific detail (in respect of FTP TLS, since I do not fully understand every detail of FTP TLS at this time). Things to consider: * Any unencrypted channel falls outside the scope of TLS (and thus any points made right below). * If the encrypted command channel has a QUIT command and the specification (or defacto default implementation) requires that the channel after receving such a command write's back a single response and then stops processing any further commands. It can be said that you already have an in-band shutdown
Re: Strange SSL_shutdown() error return (SSL_ERROR_SYSCALL but errno == 0)
Would you please confirm to the list the name of the Python module, the download site for it and the version you are currently working with. This just helps up provide assistance to this same question in future. Please read up on this recent thread. I do not know anything about Python modules myself but I believe this user was also debugging a similar issue. http://www.mail-archive.com/openssl-users@openssl.org/msg60444.html Problems with SSL_shutdown() and non blocking socket from Victor Stinner on 12-Mar-2010. Please collaborate with the official maintainers of the Python module so that a fix is incorporated upstream ASAP. If you have any further questions on the matter please direct them to this list (openssl-users). Thanks, Darryl __ OpenSSL Project http://www.openssl.org User Support Mailing Listopenssl-users@openssl.org Automated List Manager majord...@openssl.org
Re: Strange SSL_shutdown() error return (SSL_ERROR_SYSCALL but errno == 0)
Long info because I fear the Python module maybe misunderstanding what SSL_shutdown() actually does and why it exists. Which in turn mean that users of the Python module also misuse it (sandcastles in the sand and all that). Antoine Pitrou wrote: While testing Python's SSL support with OpenSSL = 0.9.8m, we have encountered a strange error return from SSL_shutdown on a non-blocking socket (note: this is a different problem from the one described by Victor Stinner in an earlier thread last month). Basically: - SSL_shutdown(ssl object) returns -1 - SSL_get_error(ssl object, -1) returns SSL_ERROR_SYSCALL - ERR_get_errno() returns 0 - errno is equal to 0 This situation was not hit before 0.9.8m. Our temptative workaround right now (not yet committed, awaiting your insight :-)) is to detect this particular situation and consider the call successful rather than raise an exception. It depends what you mean by consider the call successful. There are 2 normal non-error states for SSL_shutdown() API calls, returning 0 and returning 1. You should never consider a return of -1 to mean 1. Also a return of 1 is really the only value that indicates success. Then you have errors that are either recoverable (what I term soft-errors) and non-recoverable (hard-errors). But as the recent mailing list thread indicates ( http://www.mail-archive.com/openssl-users@openssl.org/msg60444.html ) you may consider the specific soft-error returns of -1/WANT_READ and -1/WANT_WRITE to be successful as-if SSL_shutdown() had returned 0. If you are happy with keeping the non-descriptive behavior of older OpenSSL releases. The SYS_ERROR_SYSCALL it probably because the underlying socket is no longer functional (see the comment overs EPIPE / ZERO_RETURN from the recent openssl-users list thread). You must understand it is SSL_shutdown()'s job to - commence, advance and confirm that a cryptographically secure two-way shutdown has performed. This is its purpose in the world. If you are seeing -1/ERROR_SYSCALL then that is a _CORRECT_ thing for it to return in response to observing that state while trying to perform its mission. What SSL_shutdown() is saying by returning -1/ERROR_SYSCALL is that a cryptographically two-way shutdown of the stream was _NOT_ completed and that it will probably not be able to ever be completed, probably due to the fact the underlying socket died on us. This is a fact of life you have to live with and deal with in your application now. The reason for the probably items; is that I'm sure there are other reasons that can cause it but practically most people will see this error indication at this stage due to those factors. So thinking that SSL_shutdown() was successful would be incorrect, on the basis of my definition of the purpose of SSL_shutdown(). A cryptographically secure shutdown was not completed, therfore SSL_shutdown() was not successful. I'm sorry that I've introduced this quasi-fuzziness into what was a nice clean wonderland of the Python SSL module. But it is a reality than an application should deal with and make up its own choice about. Many applications don't care for a cryptographically secure shutdown of the communication transport, since they might indicate their intention to QUIT in the normal application payload data. The other end would then send back a Bye bye, quit response message (in the normal application payload data) and the server end goes into a state of never accepting any further commands from the client after that. Over and above all this, once each end has queued the last command/response data in respect of the QUIT command processing, once that application payload has successfully cleared the SSL_write() API call, that end can immediately proceed to calling SSL_shutdown(). This will commence proceedings in respect of a secure cryptographic shutdown, by denying any further SSL_write() calls (from your side) and by sending an end-of-stream indication packet to the other end. You then have to wait (and hope) the other end sends their end-of-stream indication packet, before you will see SSL_shutdown() return 1 on your side. Only once you have both sent and received the end-of-stream indication packet will SSL_shutdown() return 1. Many client and server implementations just hang up on each other once the QUIT command response has been processed. I would guess the issue you are seeing with -1/ERROR_SYSCALL is due to this hanging up. But to be a good well meaning TLS/SSL citizen both ends should continue their non-blocking event loops for a reasonable amount of time (in the order of 5 to TCP timeout seconds) even after the last SSL_write() has been made. During this time both ends retry SSL_shutdown() over and over until it returns 1 (each time they get a non-blocking wakeup indication). So you have to stand back for a moment and examine Python's use of the OpenSSL API and decide if