with the attachment now (thanks Dmitry)

On Mon, Mar 06, 2017 at 10:44:56AM +0100, Willy Tarreau wrote:
> On Mon, Mar 06, 2017 at 09:59:21AM +0100, Matthias Fechner wrote:
> > Hi Georg,
> > 
> > Am 06.03.2017 um 09:43 schrieb Georg Faerber:
> > > I'm not running FreeBSD myself, but have a look at [1]: In the
> > > follow-ups to this thread there are two more people reporting problems.
> > >
> > > [1] https://www.mail-archive.com/[email protected]/msg25093.html
> > 
> > no, this cannot be the problem, because this error reported in [1] is
> > related to haproxy version 1.7.2.
> 
> Since we don't know what causes the issue above, we could end up discovering
> that there's a behaviour change that depends on the workload.
> 
> > My problem is related to 1.7.3. The problem was introduced by a change
> > for 1.7.3. as 1.7.2 is running fine.
> 
> Could you retry 1.7.3 by reverting the attached patch ? I don't see
> why it would cause any trouble but that's the only likely candidate
> I'm seeing between 1.7.2 and 1.7.3. If it fixes it, it may indicate
> an issue with our implementation of the kqueue poller, possibly
> explaining the issues other people have reported with fbsd 11 vs 10.
> 
> Also any hint you can provide to help people reproduce it would be welcome!
> 
> Thanks,
> Willy
>From cd4c5a3ecf5e77fb4734c423c914f7280199c763 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Wed, 25 Jan 2017 14:12:22 +0100
Subject: BUG/MEDIUM: tcp: don't poll for write when connect() succeeds
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4

While testing a tcp_fastopen related change, it appeared that in the rare
case where connect() can immediately succeed, we still subscribe to write
notifications on the socket, causing the conn_fd_handler() to immediately
be called and a second call to connect() to be attempted to double-check
the connection.

In fact this issue had already been met with unix sockets (which often
respond immediately) and partially addressed but incorrect so another
patch will follow. But for TCP nothing was done.

The fix consists in removing the WAIT_L4_CONN flag if connect() succeeds
and to subscribe for writes only if some handshakes or L4_CONN are still
needed. In addition in order not to fail raw TCP health checks, we have
to continue to enable polling for data when nothing is scheduled for
leaving and the connection is already established, otherwise the caller
will never be notified.

This fix should be backported to 1.7 and 1.6.
(cherry picked from commit 819efbf4b532d718abeb5e5aa6b2521ed725fe17)
---
 src/proto_tcp.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/src/proto_tcp.c b/src/proto_tcp.c
index f6d8ca1..c04f276 100644
--- a/src/proto_tcp.c
+++ b/src/proto_tcp.c
@@ -474,10 +474,16 @@ int tcp_connect_server(struct connection *conn, int data, 
int delack)
        if (global.tune.server_rcvbuf)
                 setsockopt(fd, SOL_SOCKET, SO_RCVBUF, 
&global.tune.server_rcvbuf, sizeof(global.tune.server_rcvbuf));
 
-       if ((connect(fd, (struct sockaddr *)&conn->addr.to, 
get_addr_len(&conn->addr.to)) == -1) &&
-           (errno != EINPROGRESS) && (errno != EALREADY) && (errno != 
EISCONN)) {
-
-               if (errno == EAGAIN || errno == EADDRINUSE || errno == 
EADDRNOTAVAIL) {
+       if (connect(fd, (struct sockaddr *)&conn->addr.to, 
get_addr_len(&conn->addr.to)) == -1) {
+               if (errno == EINPROGRESS || errno == EALREADY) {
+                       /* common case, let's wait for connect status */
+                       conn->flags |= CO_FL_WAIT_L4_CONN;
+               }
+               else if (errno == EISCONN) {
+                       /* should normally not happen but if so, indicates that 
it's OK */
+                       conn->flags &= ~CO_FL_WAIT_L4_CONN;
+               }
+               else if (errno == EAGAIN || errno == EADDRINUSE || errno == 
EADDRNOTAVAIL) {
                        char *msg;
                        if (errno == EAGAIN || errno == EADDRNOTAVAIL) {
                                msg = "no free ports";
@@ -514,6 +520,10 @@ int tcp_connect_server(struct connection *conn, int data, 
int delack)
                        return SF_ERR_SRVCL;
                }
        }
+       else {
+               /* connect() == 0, this is great! */
+               conn->flags &= ~CO_FL_WAIT_L4_CONN;
+       }
 
        conn->flags |= CO_FL_ADDR_TO_SET;
 
@@ -523,7 +533,6 @@ int tcp_connect_server(struct connection *conn, int data, 
int delack)
 
        conn_ctrl_init(conn);       /* registers the FD */
        fdtab[fd].linger_risk = 1;  /* close hard if needed */
-       conn_sock_want_send(conn);  /* for connect status */
 
        if (conn_xprt_init(conn) < 0) {
                conn_force_close(conn);
@@ -531,6 +540,17 @@ int tcp_connect_server(struct connection *conn, int data, 
int delack)
                return SF_ERR_RESOURCE;
        }
 
+       if (conn->flags & (CO_FL_HANDSHAKE | CO_FL_WAIT_L4_CONN)) {
+               conn_sock_want_send(conn);  /* for connect status, proxy 
protocol or SSL */
+       }
+       else {
+               /* If there's no more handshake, we need to notify the data
+                * layer when the connection is already OK otherwise we'll have
+                * no other opportunity to do it later (eg: health checks).
+                */
+               data = 1;
+       }
+
        if (data)
                conn_data_want_send(conn);  /* prepare to send data if any */
 
-- 
1.7.12.1

Reply via email to