Re: Issue after upgrade from 1.7 to 1.8 related with active sessions

Willy Tarreau Fri, 22 Dec 2017 09:57:55 -0800

Hi Ricardo,

On Fri, Dec 22, 2017 at 12:37:42PM +0100, Ricardo Fraile wrote:
> Continuing with the investigation, I changed the listen only to this:
> 
> listen proxy-test-tcp
>         bind *:81
>         option tcplog
>         server test1 192.168.1.101:80
> 
> 
> And the difference between 1.7 and 1.8 tracing the process who receive
> only 1 request is that the shutdown of the socket who receives the
> request fails with an ENOTCONN. In 1.8 continue in CLOSE_WAIT a few
> time, meanwhile in 1.7 pass to TIME_WAIT as usual.


(...)

I finally found it thanks to all your information and to Christopher's
bisect. I've just fixed it now with the attached patch. Feel free to
retest it, but I'm confident I can issue 1.8.2 now.

Many thanks for your very detailed report!

Willy

>From 58d05368d690b0a5962bafa38595dca7bec9ee67 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Fri, 22 Dec 2017 18:46:33 +0100
Subject: BUG/MAJOR: connection: refine the situations where we don't send
 shutw()

Since commit f9ce57e ("MEDIUM: connection: make conn_sock_shutw() aware
of lingering"), we refrain from performing the shutw() on the socket if
there is no lingering risk. But there is a problem with this in tunnel
and in TCP modes where a client is explicitly allowed to send a shutw
to the server, eventhough it it risky.

Not doing it creates this situation reported by Ricardo Fraile and
diagnosed by Christopher : a typical HTTP client (eg: curl) connecting
via the config below to an HTTP server would receive its response,
immediately close while the server remains in keep-alive mode. The
shutr() received by haproxy from the client is "propagated" to the
server side but not acted upon because fdtab[fd].linger_risk is set,
so we expect that the next close will immediately complete this
operation.

  listen proxy-tcp
    bind 127.0.0.1:8888
    mode tcp
    timeout connect 5s
    timeout server  10s
    timeout client  10s
    server server1 127.0.0.1:8000

But since the whole stream will not end until the server closes in
turn, the server doesn't close and haproxy expires on server timeout.
This problem has already struck by waking up an older bug and was
partially fixed with commit 8059351 ("BUG/MEDIUM: http: don't disable
lingering on requests with tunnelled responses") though it was not
enough.

The problem is that linger_risk is not suited here. In fact we need to
know whether or not it is desired to close normally or silently, and
whether or not a shutr() has already been received on this connection.

This is the approach this patch takes, and it solves the problem for
the various difficult modes (tcp, http-server-close, pretend-keepalive).

This fix needs to be backported to 1.8. Many thanks to Ricardo for
providing very detailed traces and configurations.
(cherry picked from commit a48c141f448e04f6da8a5b40c677042fbc730f04)

Signed-off-by: Willy Tarreau <[email protected]>
---
 include/proto/connection.h | 12 ++++++++----
 src/mux_pt.c               |  2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/proto/connection.h b/include/proto/connection.h
index 0efce99..6411764 100644
--- a/include/proto/connection.h
+++ b/include/proto/connection.h
@@ -505,17 +505,21 @@ static inline void conn_sock_read0(struct connection *c)
 }

 /* write shutdown, indication that the upper layer is not willing to send
- * anything anymore and wants to close after pending data are sent.
+ * anything anymore and wants to close after pending data are sent. The
+ * <clean> argument will allow not to perform the socket layer shutdown if
+ * equal to 0.
  */
-static inline void conn_sock_shutw(struct connection *c)
+static inline void conn_sock_shutw(struct connection *c, int clean)
 {
        c->flags |= CO_FL_SOCK_WR_SH;
        conn_refresh_polling_flags(c);
        __conn_sock_stop_send(c);
        conn_cond_update_sock_polling(c);

-       /* don't perform a clean shutdown if we're going to reset */
-       if (conn_ctrl_ready(c) && !fdtab[c->handle.fd].linger_risk)
+       /* don't perform a clean shutdown if we're going to reset or
+        * if the shutr was already received.
+        */
+       if (conn_ctrl_ready(c) && !(c->flags & CO_FL_SOCK_RD_SH) && clean)
                shutdown(c->handle.fd, SHUT_WR);
 }

diff --git a/src/mux_pt.c b/src/mux_pt.c
index b32fa8e..a68b962 100644
--- a/src/mux_pt.c
+++ b/src/mux_pt.c
@@ -151,7 +151,7 @@ static void mux_pt_shutw(struct conn_stream *cs, enum 
cs_shw_mode mode)
        if (conn_xprt_ready(cs->conn) && cs->conn->xprt->shutw)
                cs->conn->xprt->shutw(cs->conn, (mode == CS_SHW_NORMAL));
        if (!(cs->flags & CS_FL_SHR))
-               conn_sock_shutw(cs->conn);
+               conn_sock_shutw(cs->conn, (mode == CS_SHW_NORMAL));
        else
                conn_full_close(cs->conn);
 }
-- 
1.7.12.1

Re: Issue after upgrade from 1.7 to 1.8 related with active sessions

Reply via email to