[
https://issues.apache.org/jira/browse/TS-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865405#comment-13865405
]
Dimitry Andric edited comment on TS-2481 at 1/8/14 1:18 PM:
------------------------------------------------------------
Suggested fix for TS-2481.
was (Author: dim):
Suggested fix for TS-1962.
> Incorrect origin server port used sometimes (with keep-alive)
> -------------------------------------------------------------
>
> Key: TS-2481
> URL: https://issues.apache.org/jira/browse/TS-2481
> Project: Traffic Server
> Issue Type: Bug
> Components: HTTP
> Reporter: Dimitry Andric
> Attachments: fix-ts-2481-1.diff
>
>
> After the fix for TS-1962, there are some scenarios where Traffic Server can
> decide to reuse an origin server connection (because of keep-alive), even
> when the destination port is different. I encountered this with 4.0.2,
> though this bug is still in trunk, but it is masked by another fix.
> When using Traffic Server 4.0.2 as a transparent proxy, two requests that are
> made to the same origin server but on different ports, may be sent to the
> same port.
> All GETs work as expected:
> * GET to origin:80 arrives at origin:80
> * GET to origin:80, followed by GET to origin:9999 arrive at origin:80 and
> origin:9999 respectively
> * GET to origin:9999, followed by GET to origin:80 arrive at origin:9999 and
> origin:80 respectively
> But stuff gets weird after a POST:
> - POST to origin:9999, followed by GET to origin:80, arrive at origin:9999 and
> origin:9999 respectively
> This results in requests being sent to incorrect origin server ports, which
> may
> result in the HTML app getting unexpected errors, or unexpected data. A
> workaround is to disable keep-alive, but that is of course bad for
> performance.
> It looks like the problem is in proxy/http/HttpSM.cc, in function
> HttpSM::do_http_server_open():
> {noformat}
> 4438 void
> 4439 HttpSM::do_http_server_open(bool raw)
> 4440 {
> ...
> 4510 if (raw == false && t_state.txn_conf->share_server_sessions &&
> 4511 (t_state.txn_conf->keep_alive_post_out == 1 ||
> t_state.hdr_info.request_content_length == 0) &&
> 4512 !is_private() && ua_session != NULL) {
> ...
> 4540 else if ((!t_state.txn_conf->share_server_sessions || is_private())
> && (ua_session != NULL)) {
> 4541 HttpServerSession *existing_ss = ua_session->get_server_session();
> 4542
> 4543 if (existing_ss) {
> 4544 // [amc] Is this OK? Should we compare ports? (not done by
> ats_ip_addr_cmp)
> 4545 if (ats_ip_addr_eq(&existing_ss->server_ip.sa,
> &t_state.current.server->addr.sa)) {
> 4546 ua_session->attach_server_session(NULL);
> 4547 existing_ss->state = HSS_ACTIVE;
> 4548 this->attach_server_session(existing_ss);
> 4549 hsm_release_assert(server_session != NULL);
> 4550 handle_http_server_open();
> 4551 return;
> 4552 } else {
> 4553 // As this is in the non-sharing configuration, we want to
> close
> 4554 // the existing connection and call connect_re to get a new
> one
> 4555 existing_ss->release();
> 4556 ua_session->attach_server_session(NULL);
> 4557 }
> 4558 }
> 4559 }
> {noformat}
> When the condition in line 4540 is hit, and get_server_session() returns an
> existing session, the addresses of the existing session and the current one
> are compared (line 4545), but *not* the port numbers, as the comment in line
> 4544 indicates. This is incorrect, as an existing session to a different
> port on the same origin server should never be reused.
> The most obvious fix is to add a comparison of the ports, e.g.:
> {noformat}
> if (ats_ip_addr_eq(&existing_ss->server_ip.sa,
> &t_state.current.server->addr.sa) &&
> ats_ip_port_cast(&existing_ss->server_ip.sa) ==
> ats_ip_port_cast(&t_state.current.server->addr)) {
> {noformat}
> I have added this locally as a patch, and with that, the scenario described
> at the top of this bug does not occur anymore.
> Note that the issue in this particular scenario seems to be masked by the fix
> for TS-312. Initially, I tried reproducing with trunk, but I found it only
> occurs before that fix. After TS-312, the condition in line 4540 above is
> not hit anymore, but the first one in line 4510 is hit instead. However,
> there may still be ways to make the execution flow branch to line 4545, and
> that could still trigger the bug.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)