TS hangs (dead lock) on HTTPS POST requests
-------------------------------------------

                 Key: TS-1049
                 URL: https://issues.apache.org/jira/browse/TS-1049
             Project: Traffic Server
          Issue Type: Bug
          Components: Core, HTTP, SSL
    Affects Versions: 3.1.0, 3.1.1, 3.0.2
         Environment: RedHat Enterprise Linux 6.0, Intel 32-bit
            Reporter: Wilson Ho
            Priority: Blocker


A very reproducible bug where the body of a HTTPS POST request is never 
forwarded to the origin server.

Client submits a HTTPS POST request to TS, which is supposed to forward to the 
backend/origin server via HTTP.  TS process the HTTP headers and establishes 
connection to the origin server, but the body of the HTTPS POST is never read.  
This hangs until the client times out and shuts down the connection.

To reproduce:
1) Client connects to TS using HTTPS (works OK if it is just HTTP).
2) It must be a POST request.
3) TS must use at least 2 worker threads.
4) Easier to reproduce when the connections to the origin server is HTTP (not 
HTTPS).
5) POST body must be large enough so that the HTTP request headers and POST 
body do *NOT* fit within the same TCP packet. (2000 bytes is a good size)
6) I can consistently reproduce this problem using 2 separate clients each 
simultaneously submitting 2 requests back to back (i.e., 2 requests from each 
client, a total of 4 requests).  This gives you a high probability that at 
least one of the requests would hang.

Observation:
1) Thread A accepted and processed the HTTP headers, and called 
"UnixNetProcessor::connect_re" to prepare a new connection to the origin server.
2) Thread A must not have read the body of the POST.  Otherwise, it works fine.
3) Thread B was assigned the task to handle the origin server connection.  If 
the same thread A was picked, then everything works fine.
4) Apparently, one of the first things that thread B does is to acquire the 
mutex for reading from the client.  (Why does it do that??)
5) While thread B was holding the mutex, thread A proceeded in 
"SSLNetVConnection::net_read_io", tried and failed to acquire the mutex.  
Thread A typically re-tried calling "SSLNetVConnection::net_read_io" soon, but 
gave up after the second failure. But if thread B released the mutex soon 
enough, that thread A could proceed happily and everything works.
6) From this point, the body of the POST is never read from the client, and 
there is nothing to be proxy'd to the origin server, and both the consumer and 
producer tasks are never scheduled to run again -- or until the client times 
out.  I tried setting the client-side time out to as long as 3-5 minutes and TS 
really does not recover by itself until the client closed the connection.

This is the first time I uses this bug system.  Please let me know how I could 
produce the configuration files and trace logs, etc.  Thanks!


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to