On Fri, Mar 17, 2017 at 12:10:43PM +0300, Dmitry Sivachenko wrote: > > > On 17 Mar 2017, at 12:04, Willy Tarreau <[email protected]> wrote: > > > > Hi Dmitry, > > > > On Wed, Mar 15, 2017 at 12:45:54AM +0300, Dmitry Sivachenko wrote: > >> I committed your patch to FreeBSD ports. > > > > I was just reported an undesired side effect of this patch with smtp > > in clear without proxy-proto :-( > > > > [...] > > > > Okay, thanks for information. I have no other complains so far, so I wait a > bit for an update.
OK here's a temporary patch. It includes a revert of the previous one and adds a condition for the wake-up. At least it passes all my tests, including those involving synchronous connection reports. I'm not merging it yet as I'm wondering whether a reliable definitive solution should be done once for all (and backported) by addressing the root cause instead of constantly working around its consequences. Willy
>From 42121fa95cf77be0384ab6ce597a88c2fe6e2cd6 Mon Sep 17 00:00:00 2001 From: Willy Tarreau <[email protected]> Date: Sat, 18 Mar 2017 12:04:38 +0100 Subject: WIP/BUG/MAJOR: connection: also call ->wake() after end of handshakes X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 We have an issue with the relation between CO_FL_CONNECTED and the various handshake flags which was uncovered by recent fix 7bf3fa3 ("BUG/MAJOR: connection: update CO_FL_CONNECTED before calling the data layer"). The flag was normally set *after* calling ->wake() to indicate that both the control layer and the transport layer are ready, and that the connection is ready to exchange data. It is important that the ->wake() call sees the transition with both CO_FL_{L4,L6}_CONN cleared and CO_FL_CONNECTED not yet set, to detect the status change. But we have an issue with handshakes like the proxy protocol. These ones have to be completed before the data layer may be used, and after completion polling for data may be enabled. The problem comes from the relation between CO_FL_CONNECT and the handshakes. In fact, the SSL handshake is used to complete the transport layer setup, so there's a direct 1-to-1 relation between L6_CONN and the SSL handshake. It will be broken if we later decide to implement upgradable SSL by the way. Some handshakes like accepting the proxy protocol or sending it area easily hidden behind the SSL handshake (which has to be performed on top of it), so they're done with CO_FL_CONNECTED unset. But if we only have the proxy protocol and no SSL, we end up in a situation where CO_FL_CONNECTED might already be set and where the removal of the handshake flag doesn't constitute a sufficient condition to notify the data layer about the availability of the connection, which fails each time the connection succeeds synchronously without any data to send. It could be argued that the handshake flags have to be made part of the condition to et CO_FL_CONNECTED but that would currently break a part of the health checks. Also a handshake could appear at any moment even after a connection is established so we'd lose the ability to detect a second end of handshake. For now the situation is not clean : - session_accept() only sets CO_FL_CONNECTED if there's no pending handshake ; - conn_fd_handler() will set it once L4 and L6 are complete, which will do what session_accept() above refrained from doing even if an accept_proxy handshake is still pending ; - ssl_sock_info_cbk() and ssl_sock_handshake() consider that a handshake performed with CO_FL_CONNECTED set is a renegociation ; - all ssl_fc_* sample fetch functions wait for CO_FL_CONNECTED before accepting to fetch information Only health checks validate the end of handshakes and the connected flag separately. This patch aims at solving the side effects in a backportable way before this is reworked in depth : - we need to call ->wake() to report connection success, measure connection time, and notify that the data layer is ready ; this has to be done either if we switch from pending {L4,L6}_CONN to nothing, or if we notice some handshakes were pending and are now done. - we must only set CO_FL_CONNECTED after calling ->wake() because this one requires all flags down to detect the transition ; this rolls back previous change ; - we must ensure that we'll see CO_FL_{CONNECTED,L4,L6} cleared at least once in each connection's lifetime to ensure no wake up event may be lost. - we document that CO_FL_CONNECTED exactly means "L4 connection setup confirmed at least once, L6 connection setup confirmed at least once or not necessary, all this regardless of any possibly remaining handshakes or future L6 negociations". For later, si_conn_wake_cb() has to be fixed not to consider the connection's flags but the stream-interface's and channel status instead, making it agnostic to the change on the CO_FL_CONNECTED flag, and other data layers (checks) will likely have to do something similar. --- include/types/connection.h | 2 +- src/connection.c | 25 ++++++++++++++++++------- 2 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/types/connection.h b/include/types/connection.h index c644dd5..6ec0e4f 100644 --- a/include/types/connection.h +++ b/include/types/connection.h @@ -98,7 +98,7 @@ enum { /* flags used to report connection status and errors */ CO_FL_ERROR = 0x00100000, /* a fatal error was reported */ - CO_FL_CONNECTED = 0x00200000, /* the connection is now established */ + CO_FL_CONNECTED = 0x00200000, /* L4+L6 now ready ; extra handshakes may or may not exist */ CO_FL_WAIT_L4_CONN = 0x00400000, /* waiting for L4 to be connected */ CO_FL_WAIT_L6_CONN = 0x00800000, /* waiting for L6 to be connected (eg: SSL) */ diff --git a/src/connection.c b/src/connection.c index bc34ae7..ab5fcd4 100644 --- a/src/connection.c +++ b/src/connection.c @@ -132,21 +132,32 @@ void conn_fd_handler(int fd) } leave: - /* Verify if the connection just established. The CO_FL_CONNECTED flag - * being included in CO_FL_CONN_STATE, its change will be noticed by - * the next block and be used to wake up the data layer. + /* The wake callback is normally used to notify about connection establishment + * and critical errors. The called functions need to detect the change so the + * CO_FL_CONNECTED flag must not have been updated yet. However we need to + * consider the following situations to wake up the data layer : + * - change among the CO_FL_CONN_STATE flags : + * {DATA,SOCK}_{RD,WR}_SH, ERROR, CONNECTED, WAIT_L4_CONN, WAIT_L6_CONN + * - absence of any of {L4,L6}_CONN and CONNECTED, indicating the + * end of handshake and transition to CONNECTED + * - end of handshakes */ - if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED)))) - conn->flags |= CO_FL_CONNECTED; /* The wake callback may be used to process a critical error and abort the * connection. If so, we don't want to go further as the connection will * have been released and the FD destroyed. */ if ((conn->flags & CO_FL_WAKE_DATA) && - ((conn->flags ^ flags) & CO_FL_CONN_STATE) && - conn->data->wake(conn) < 0) + (((conn->flags ^ flags) & CO_FL_CONN_STATE) || + ((flags & CO_FL_HANDSHAKE) && !(conn->flags & CO_FL_HANDSHAKE))) && + conn->data->wake(conn) < 0) { return; + } + + /* Now set the CO_FL_CONNECTED flag if the connection was just established. */ + + if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED)))) + conn->flags |= CO_FL_CONNECTED; /* remove the events before leaving */ fdtab[fd].ev &= FD_POLL_STICKY; -- 2.8.0.rc2.1.gbe9624a

