Andrew,

Do you have any up-to-date instructions on compiling nsopenssl? For some
reason I'm getting a segfault the instant I try to use the client
ns_httpspost. 

I'm think it is related to the linux distribution, but the crash isn't
the random problem you are seeing. 

Thanks,

tom jackson

On Thu, 2009-04-30 at 17:59 -0500, Andrew Steets wrote:
> Hello,
> 
> We recently discovered a problem with the nsopenssl ns_httpsXXX client
> commands which was causing SSL close notify alerts (a.k.a. random
> binary garbage) to be written to unrelated (non-ssl) file descriptors
> in certain cases.  While we were trying to come up with a fix, we
> stumbled across some other nsopenssl issues.
> 
> If you aren't using the nsopenssl *client* functionality this is
> probably not interesting.  If you aren't interested in hacking the
> nsopenssl code then you should realize that this may be a potential
> source of frustration.  For anyone else, details follow.
> 
> All of the ns_https client TCL (https.tcl) commands eventually call
> ns_openssl_sockopen to open an SSL connection to a server.
> ns_openssl_sockopen, like ns_sockopen, returns two TCL channel ids,
> one of which is for reading and the other for writing.  The TCL
> channels are created in CreateTclChannel() in nsopenssl's tclcmds.c.
> The channels are stored in a pair of structs with the following
> definition:
> 
> typedef struct ChanInfo {
>     NsOpenSSLConn   *sslconn;
>     SOCKET           socket;
>     Tcl_Channel      chan;
>     void            *otherchaninfo;
> } ChanInfo;
> 
> so the write chaninfo holds a pointer to the read chaninfo and vice
> versa.  The channels are currently constructed such that the read
> channel is associated with the original socket fd created for the ssl
> connection, and the write channel is associated with another fd
> dup()'ed from the original.  They are both associated with the same
> NsOpenSSLConn struct, which itself holds the original socket fd as
> well.
> 
> The channel close function, ChanCloseProc(), has to deal with this two
> fd situation, and that is where we run into problems.  The close proc
> will close the fd associated with whichever channel is being closed,
> but will only shutdown the ssl connection when both channels have been
> closed.
> 
> Here is the slightly edited close chan code:
> 
> static int
> ChanCloseProc(ClientData arg, Tcl_Interp *interp)
> {
>     ChanInfo *chaninfo      = (ChanInfo *) arg;
>     ChanInfo *otherchaninfo = NULL;
> 
>     Tcl_UnregisterChannel(interp, chaninfo->chan);
>     ns_sockclose(chaninfo->socket);
>     chaninfo->socket = INVALID_SOCKET;
>     otherchaninfo = (ChanInfo *) chaninfo->otherchaninfo;
> 
>     if (otherchaninfo->socket == INVALID_SOCKET) {
>         ns_free(otherchaninfo);
>         NsOpenSSLConnDestroy(chaninfo->sslconn);
>         ns_free(chaninfo);
>     }
> 
>     return TCL_OK;
> }
> 
> One problem is that the ns_sockclose() call precedes the
> NsOpenSSLConnDestroy() call.  NsOpenSSLConnDestroy() calls
> SSL_shutdown() on the file descriptor which was previously closed with
> ns_sockclose().  SSL_shutdown() tries to write some ssl close notify
> messages on the fd.  There is no way this can succeed because the fd
> was already closed.  The error is siliently ignored.  Clearly the sock
> close needs to come after NsOpenSSLConnDestroy().
> 
> But there is more.  Now we need to examine two possible cases.
> 
> Case 1: The write channel is closed before the read channel.  In this
> case the dup fd is closed first, and the original FD is closed second.
>  There is a teensy little race condition here.  After the
> ns_sockclose() call, the OS may context switch to another thread which
> may call open(), dup(), socket() or anything that gets a new FD.  It's
> also possible that the FD that the OS returns for that call may have
> been the one which was previously closed with ns_sockclose().  If we
> then switch back to the original thread and call
> NsOpenSSLConnDestroy() -> SSL_shutdown(), then we will end up writing
> and reading on somebody else's file file descriptor!  This is
> obviously bad, but the chances of this race actually occuring are
> probably slim.
> 
> Case 2:  The read channel is closed before the write channel.  This is
> the worst.  The original fd, the one in the NsOpenSSLConn struct is
> closed, but NsOpenSSLConnDestroy is not called because the write
> channel is still open and the sslconn * still holds the now invalid
> fd.  Now we have a much larger window for that FD to be recycled by
> the OS and we don't necessarily need an unlikely context switch to be
> stung by the race.  The following ADP highlights this condition.
> 
> <%
> set fds [ns_openssl_sockopen -nonblock www.att.com 443]
> 
> set rfd [lindex $fds 0]
> set wfd [lindex $fds 1]
> 
> ns_adp_puts "rfd: $rfd"
> ns_adp_puts "wfd: $wfd"
> 
> _ns_https_puts 5 $wfd "GET / HTTP/1.0\r"
> 
> close $rfd
> 
> set tmpfd [open /tmp/nsopenssl w]
> 
> ns_adp_puts "tmpfd: $tmpfd"
> 
> close $wfd
> close $tmpfd
> %>
> 
> If you look at /tmp/nsopenssl after running this ADP, you should see
> some random binary garbage from the ssl shutdown writing on the wrong
> fd.
> 
> Here is a patch that we are using.  It switches the code to use only
> one FD (but still two TCL channels).  It should avoid any close/open
> FD races.  I have only tested on Linux, not sure how it might affect
> other platforms.  Sorry about the verbosity.
> 
> -Andrew
> 
> 
> --
> AOLserver - http://www.aolserver.com/
> 
> To Remove yourself from this list, simply send an email to 
> <lists...@listserv.aol.com> with the
> body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
> field of your email blank.


--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<lists...@listserv.aol.com> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to