Dimitry Andric created TS-2481:
----------------------------------

             Summary: Incorrect origin server port used sometimes (with 
keep-alive)
                 Key: TS-2481
                 URL: https://issues.apache.org/jira/browse/TS-2481
             Project: Traffic Server
          Issue Type: Bug
          Components: HTTP
            Reporter: Dimitry Andric


After the fix for TS-1962, there are some scenarios where Traffic Server can 
decide to reuse an origin server connection (because of keep-alive), even when 
the destination port is different.  I encountered this with 4.0.2, though this 
bug is still in trunk, but it is masked by another fix.

When using Traffic Server 4.0.2 as a transparent proxy, two requests that are 
made to the same origin server but on different ports, may be sent to the same 
port.

All GETs work as expected:
* GET to origin:80 arrives at origin:80
* GET to origin:80, followed by GET to origin:9999 arrive at origin:80 and
origin:9999 respectively
* GET to origin:9999, followed by GET to origin:80 arrive at origin:9999 and
origin:80 respectively

But stuff gets weird after a POST:
- POST to origin:9999, followed by GET to origin:80, arrive at origin:9999 and
origin:9999 respectively

This results in requests being sent to incorrect origin server ports, which may
result in the HTML app getting unexpected errors, or unexpected data.  A 
workaround is to disable keep-alive, but that is of course bad for performance.

It looks like the problem is in proxy/http/HttpSM.cc, in function 
HttpSM::do_http_server_open():

{noformat}
  4438  void
  4439  HttpSM::do_http_server_open(bool raw)
  4440  {
...
  4510    if (raw == false && t_state.txn_conf->share_server_sessions &&
  4511        (t_state.txn_conf->keep_alive_post_out == 1 || 
t_state.hdr_info.request_content_length == 0) &&
  4512         !is_private() && ua_session != NULL) {
...
  4540    else if ((!t_state.txn_conf->share_server_sessions || is_private()) 
&& (ua_session != NULL)) {
  4541      HttpServerSession *existing_ss = ua_session->get_server_session();
  4542
  4543      if (existing_ss) {
  4544        // [amc] Is this OK? Should we compare ports? (not done by 
ats_ip_addr_cmp)
  4545        if (ats_ip_addr_eq(&existing_ss->server_ip.sa, 
&t_state.current.server->addr.sa)) {
  4546          ua_session->attach_server_session(NULL);
  4547          existing_ss->state = HSS_ACTIVE;
  4548          this->attach_server_session(existing_ss);
  4549          hsm_release_assert(server_session != NULL);
  4550          handle_http_server_open();
  4551          return;
  4552        } else {
  4553          // As this is in the non-sharing configuration, we want to close
  4554          // the existing connection and call connect_re to get a new one
  4555          existing_ss->release();
  4556          ua_session->attach_server_session(NULL);
  4557        }
  4558      }
  4559    }
{noformat}

When the condition in line 4540 is hit, and get_server_session() returns an 
existing session, the addresses of the existing session and the current one are 
compared (line 4545), but *not* the port numbers, as the comment in line 4544 
indicates.  This is incorrect, as an existing session to a different port on 
the same origin server should never be reused.

The most obvious fix is to add a comparison of the ports, e.g.:

{noformat}
      if (ats_ip_addr_eq(&existing_ss->server_ip.sa, 
&t_state.current.server->addr.sa) &&
          ats_ip_port_cast(&existing_ss->server_ip.sa) == 
ats_ip_port_cast(&t_state.current.server->addr)) {
{noformat}

I have added this locally as a patch, and with that, the scenario described at 
the top of this bug does not occur anymore.

Note that the issue in this particular scenario seems to be masked by the fix 
for TS-312.  Initially, I tried reproducing with trunk, but I found it only 
occurs before that fix.  After TS-312, the condition in line 4540 above is not 
hit anymore, but the first one in line 4510 is hit instead.  However, there may 
still be ways to make the execution flow branch to line 4545, and that could 
still trigger the bug.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to