[AOLSERVER] nsopenssl client file descriptor issues

Andrew Steets Thu, 30 Apr 2009 16:40:51 -0700

Hello,

We recently discovered a problem with the nsopenssl ns_httpsXXX client
commands which was causing SSL close notify alerts (a.k.a. random
binary garbage) to be written to unrelated (non-ssl) file descriptors
in certain cases.  While we were trying to come up with a fix, we
stumbled across some other nsopenssl issues.


If you aren't using the nsopenssl *client* functionality this is
probably not interesting.  If you aren't interested in hacking the
nsopenssl code then you should realize that this may be a potential
source of frustration.  For anyone else, details follow.

All of the ns_https client TCL (https.tcl) commands eventually call
ns_openssl_sockopen to open an SSL connection to a server.
ns_openssl_sockopen, like ns_sockopen, returns two TCL channel ids,
one of which is for reading and the other for writing.  The TCL
channels are created in CreateTclChannel() in nsopenssl's tclcmds.c.
The channels are stored in a pair of structs with the following
definition:

typedef struct ChanInfo {
    NsOpenSSLConn   *sslconn;
    SOCKET           socket;
    Tcl_Channel      chan;
    void            *otherchaninfo;
} ChanInfo;

so the write chaninfo holds a pointer to the read chaninfo and vice
versa.  The channels are currently constructed such that the read
channel is associated with the original socket fd created for the ssl
connection, and the write channel is associated with another fd
dup()'ed from the original.  They are both associated with the same
NsOpenSSLConn struct, which itself holds the original socket fd as
well.

The channel close function, ChanCloseProc(), has to deal with this two
fd situation, and that is where we run into problems.  The close proc
will close the fd associated with whichever channel is being closed,
but will only shutdown the ssl connection when both channels have been
closed.

Here is the slightly edited close chan code:

static int
ChanCloseProc(ClientData arg, Tcl_Interp *interp)
{
    ChanInfo *chaninfo      = (ChanInfo *) arg;
    ChanInfo *otherchaninfo = NULL;

    Tcl_UnregisterChannel(interp, chaninfo->chan);
    ns_sockclose(chaninfo->socket);
    chaninfo->socket = INVALID_SOCKET;
    otherchaninfo = (ChanInfo *) chaninfo->otherchaninfo;

    if (otherchaninfo->socket == INVALID_SOCKET) {
        ns_free(otherchaninfo);
        NsOpenSSLConnDestroy(chaninfo->sslconn);
        ns_free(chaninfo);
    }

    return TCL_OK;
}

One problem is that the ns_sockclose() call precedes the
NsOpenSSLConnDestroy() call.  NsOpenSSLConnDestroy() calls
SSL_shutdown() on the file descriptor which was previously closed with
ns_sockclose().  SSL_shutdown() tries to write some ssl close notify
messages on the fd.  There is no way this can succeed because the fd
was already closed.  The error is siliently ignored.  Clearly the sock
close needs to come after NsOpenSSLConnDestroy().

But there is more.  Now we need to examine two possible cases.

Case 1: The write channel is closed before the read channel.  In this
case the dup fd is closed first, and the original FD is closed second.
 There is a teensy little race condition here.  After the
ns_sockclose() call, the OS may context switch to another thread which
may call open(), dup(), socket() or anything that gets a new FD.  It's
also possible that the FD that the OS returns for that call may have
been the one which was previously closed with ns_sockclose().  If we
then switch back to the original thread and call
NsOpenSSLConnDestroy() -> SSL_shutdown(), then we will end up writing
and reading on somebody else's file file descriptor!  This is
obviously bad, but the chances of this race actually occuring are
probably slim.

Case 2:  The read channel is closed before the write channel.  This is
the worst.  The original fd, the one in the NsOpenSSLConn struct is
closed, but NsOpenSSLConnDestroy is not called because the write
channel is still open and the sslconn * still holds the now invalid
fd.  Now we have a much larger window for that FD to be recycled by
the OS and we don't necessarily need an unlikely context switch to be
stung by the race.  The following ADP highlights this condition.

<%
set fds [ns_openssl_sockopen -nonblock www.att.com 443]

set rfd [lindex $fds 0]
set wfd [lindex $fds 1]

ns_adp_puts "rfd: $rfd"
ns_adp_puts "wfd: $wfd"

_ns_https_puts 5 $wfd "GET / HTTP/1.0\r"

close $rfd

set tmpfd [open /tmp/nsopenssl w]

ns_adp_puts "tmpfd: $tmpfd"

close $wfd
close $tmpfd
%>

If you look at /tmp/nsopenssl after running this ADP, you should see
some random binary garbage from the ssl shutdown writing on the wrong
fd.

Here is a patch that we are using.  It switches the code to use only
one FD (but still two TCL channels).  It should avoid any close/open
FD races.  I have only tested on Linux, not sure how it might affect
other platforms.  Sorry about the verbosity.

-Andrew


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<[email protected]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

? nsopenssl.patch
Index: tclcmds.c
===================================================================
RCS file: /cvsroot/aolserver/nsopenssl/tclcmds.c,v
retrieving revision 1.51
diff -p -u -r1.51 tclcmds.c
--- tclcmds.c	13 Jun 2004 04:21:31 -0000	1.51
+++ tclcmds.c	30 Apr 2009 22:52:59 -0000
@@ -1502,7 +1502,7 @@ CreateTclChannel(NsOpenSSLConn *sslconn,
      */
 
     getschan->socket = sslconn->socket;
-    sprintf(channelName, "openssl%d", getschan->socket);
+    sprintf(channelName, "openssl%dr", getschan->socket);
     getschan->chan = Tcl_CreateChannel(
             &opensslChannelType,
             channelName,
@@ -1523,8 +1523,8 @@ CreateTclChannel(NsOpenSSLConn *sslconn,
      * Set up the write channel.
      */
 
-    putschan->socket = ns_sockdup(sslconn->socket);
-    sprintf(channelName, "openssl%d", putschan->socket);
+    putschan->socket = sslconn->socket;
+    sprintf(channelName, "openssl%dw", putschan->socket);
     putschan->chan = Tcl_CreateChannel(
             &opensslChannelType,
             channelName,
@@ -1647,14 +1647,17 @@ ChanCloseProc(ClientData arg, Tcl_Interp
 
     //Ns_Log(Debug, "*** CHAN DESTROY: %s", Tcl_GetChannelName(chaninfo->chan));
     Tcl_UnregisterChannel(interp, chaninfo->chan);
-    ns_sockclose(chaninfo->socket);
-    chaninfo->socket = INVALID_SOCKET;
     otherchaninfo = (ChanInfo *) chaninfo->otherchaninfo;
 
-    if (otherchaninfo->socket == INVALID_SOCKET) {
-    //Ns_Log(Debug, "*** SSL DESTROY");
-        ns_free(otherchaninfo);
+    if (otherchaninfo->socket != INVALID_SOCKET) {
+        // Only one channel closed at this point.
+        chaninfo->socket = INVALID_SOCKET;
+    } else {
+        // Ok both channels closed.  Destroy for realz.
+        //Ns_Log(Debug, "*** SSL DESTROY");
         NsOpenSSLConnDestroy(chaninfo->sslconn);
+        ns_sockclose(chaninfo->socket);
+        ns_free(otherchaninfo);
         ns_free(chaninfo);
     }

[AOLSERVER] nsopenssl client file descriptor issues

Reply via email to