Andrew, Do you have any up-to-date instructions on compiling nsopenssl? For some reason I'm getting a segfault the instant I try to use the client ns_httpspost.
I'm think it is related to the linux distribution, but the crash isn't the random problem you are seeing. Thanks, tom jackson On Thu, 2009-04-30 at 17:59 -0500, Andrew Steets wrote: > Hello, > > We recently discovered a problem with the nsopenssl ns_httpsXXX client > commands which was causing SSL close notify alerts (a.k.a. random > binary garbage) to be written to unrelated (non-ssl) file descriptors > in certain cases. While we were trying to come up with a fix, we > stumbled across some other nsopenssl issues. > > If you aren't using the nsopenssl *client* functionality this is > probably not interesting. If you aren't interested in hacking the > nsopenssl code then you should realize that this may be a potential > source of frustration. For anyone else, details follow. > > All of the ns_https client TCL (https.tcl) commands eventually call > ns_openssl_sockopen to open an SSL connection to a server. > ns_openssl_sockopen, like ns_sockopen, returns two TCL channel ids, > one of which is for reading and the other for writing. The TCL > channels are created in CreateTclChannel() in nsopenssl's tclcmds.c. > The channels are stored in a pair of structs with the following > definition: > > typedef struct ChanInfo { > NsOpenSSLConn *sslconn; > SOCKET socket; > Tcl_Channel chan; > void *otherchaninfo; > } ChanInfo; > > so the write chaninfo holds a pointer to the read chaninfo and vice > versa. The channels are currently constructed such that the read > channel is associated with the original socket fd created for the ssl > connection, and the write channel is associated with another fd > dup()'ed from the original. They are both associated with the same > NsOpenSSLConn struct, which itself holds the original socket fd as > well. > > The channel close function, ChanCloseProc(), has to deal with this two > fd situation, and that is where we run into problems. The close proc > will close the fd associated with whichever channel is being closed, > but will only shutdown the ssl connection when both channels have been > closed. > > Here is the slightly edited close chan code: > > static int > ChanCloseProc(ClientData arg, Tcl_Interp *interp) > { > ChanInfo *chaninfo = (ChanInfo *) arg; > ChanInfo *otherchaninfo = NULL; > > Tcl_UnregisterChannel(interp, chaninfo->chan); > ns_sockclose(chaninfo->socket); > chaninfo->socket = INVALID_SOCKET; > otherchaninfo = (ChanInfo *) chaninfo->otherchaninfo; > > if (otherchaninfo->socket == INVALID_SOCKET) { > ns_free(otherchaninfo); > NsOpenSSLConnDestroy(chaninfo->sslconn); > ns_free(chaninfo); > } > > return TCL_OK; > } > > One problem is that the ns_sockclose() call precedes the > NsOpenSSLConnDestroy() call. NsOpenSSLConnDestroy() calls > SSL_shutdown() on the file descriptor which was previously closed with > ns_sockclose(). SSL_shutdown() tries to write some ssl close notify > messages on the fd. There is no way this can succeed because the fd > was already closed. The error is siliently ignored. Clearly the sock > close needs to come after NsOpenSSLConnDestroy(). > > But there is more. Now we need to examine two possible cases. > > Case 1: The write channel is closed before the read channel. In this > case the dup fd is closed first, and the original FD is closed second. > There is a teensy little race condition here. After the > ns_sockclose() call, the OS may context switch to another thread which > may call open(), dup(), socket() or anything that gets a new FD. It's > also possible that the FD that the OS returns for that call may have > been the one which was previously closed with ns_sockclose(). If we > then switch back to the original thread and call > NsOpenSSLConnDestroy() -> SSL_shutdown(), then we will end up writing > and reading on somebody else's file file descriptor! This is > obviously bad, but the chances of this race actually occuring are > probably slim. > > Case 2: The read channel is closed before the write channel. This is > the worst. The original fd, the one in the NsOpenSSLConn struct is > closed, but NsOpenSSLConnDestroy is not called because the write > channel is still open and the sslconn * still holds the now invalid > fd. Now we have a much larger window for that FD to be recycled by > the OS and we don't necessarily need an unlikely context switch to be > stung by the race. The following ADP highlights this condition. > > <% > set fds [ns_openssl_sockopen -nonblock www.att.com 443] > > set rfd [lindex $fds 0] > set wfd [lindex $fds 1] > > ns_adp_puts "rfd: $rfd" > ns_adp_puts "wfd: $wfd" > > _ns_https_puts 5 $wfd "GET / HTTP/1.0\r" > > close $rfd > > set tmpfd [open /tmp/nsopenssl w] > > ns_adp_puts "tmpfd: $tmpfd" > > close $wfd > close $tmpfd > %> > > If you look at /tmp/nsopenssl after running this ADP, you should see > some random binary garbage from the ssl shutdown writing on the wrong > fd. > > Here is a patch that we are using. It switches the code to use only > one FD (but still two TCL channels). It should avoid any close/open > FD races. I have only tested on Linux, not sure how it might affect > other platforms. Sorry about the verbosity. > > -Andrew > > > -- > AOLserver - http://www.aolserver.com/ > > To Remove yourself from this list, simply send an email to > <lists...@listserv.aol.com> with the > body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: > field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <lists...@listserv.aol.com> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.