Hi,
I've been working on an issue we've been seeing on very high loads with
a configuration that boils down to:
[SSL Traffic] ---> [HAProxy] ---[via send_proxy]--> [Proxy]
We're seeing this with 1.5.12 but the latest git behaves in the same way
(Patches are against lastest git).
We have custom code in conn_si_send_proxy() that looks at
remote->t.sock.fd, and does a getsockname() on it. Sometimes that call
fails with EBADFD.
It appears that the reason this is happening is that frontend socket has
been closed via fd_delete() before we go and try to open a socket to the
backend (note that we could have checked conn->flags and see that the
CO_FL_ERROR flag was set).
I believe the decision to proceed to open a socket happens when some
data was read from the request and an error was received from SSL in the
same conn_fd_handler() call, so it's a pretty narrow window.
When that happens CO_FL_ERROR is set *and* the input channel is non-empty.
Back in process_stream(), we hit this condition:
2187 if (unlikely((req->flags & (CF_SHUTW|CF_SHUTW_NOW)) ==
CF_SHUTW_NOW &&
2188 channel_is_empty(req))) {
2189 if (req->flags & CF_READ_ERROR)
2190 si_b->flags |= SI_FL_NOLINGER;
2191 si_shutw(si_b);
2192 if (tick_isset(s->be->timeout.serverfin)) {
2193 res->rto = s->be->timeout.serverfin;
2194 res->rex = tick_add(now_ms, res->rto);
2195 }
2196 }
req->flags is CF_SHUTW_NOW, but the channel is not empty, so we don't
call si_shutw(si_b). Then further down:
2215 if (si_b->state == SI_ST_INI) {
2216 if (!(req->flags & CF_SHUTW)) {
2217 if ((req->flags & CF_AUTO_CONNECT) ||
!channel_is_empty(req)) {
2218 /* If we have an appctx, there is
no connect method, so we
2219 * immediately switch to the
connected state, otherwise we
2220 * perform a connection request.
2221 */
2222 si_b->state = SI_ST_REQ; /* new
connection requested */
2223 si_b->conn_retries = s->be->conn_retries;
2224 }
2225 }
2226 else {
2227 si_b->state = SI_ST_CLO; /* shutw+ini = abort */
2228 channel_shutw_now(req); /* fix
buffer flags upon abort */
2229 channel_shutr_now(res);
2230 }
2231 }
.... req->flags is still CF_SHUTW_NOW, so we don't go to the 'else' and we
don't abort the connection.
It seems that we would be a bit more efficient if we also aborted when
si_b->state was SI_ST_INI: that is, don't even try to open a connection
to the backend if we're shutting down the frontend.
Something like the patch below:
diff --git a/src/stream.c b/src/stream.c
index 4d87fc9..fe9d04a 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -2183,9 +2183,12 @@ struct task *process_stream(struct task *t)
channel_shutw_now(req);
}
- /* shutdown(write) pending */
+ /* shutdown(write) pending.
+ * Shutdown if the channel is empty or if we
+ * haven't connected yet to the backend
+ */
if (unlikely((req->flags & (CF_SHUTW|CF_SHUTW_NOW)) == CF_SHUTW_NOW &&
- channel_is_empty(req))) {
+ (si_b->state == SI_ST_INI || channel_is_empty(req)))) {
if (req->flags & CF_READ_ERROR)
si_b->flags |= SI_FL_NOLINGER;
si_shutw(si_b);
I've managed to setup small repro environment with the following config:
======== cfg ============
global
daemon
maxconn 256
stats socket /tmp/haproxy.sock.1 mode 600 level admin process 1
defaults
mode tcp
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:1443 alpn http/1.1,http/1.0 defer-accept ssl crt /<path to
test key>/example.pem
default_backend servers
backend servers
# This is not really a proxy, it's really just a place holder so we
# call conn_si_send_proxy()
server server1 www.fastly.com:443 ssl verify none send-proxy-v2 maxconn 32
balance leastconn
======== /cfg ============
And the following patch in order to simulate an ssl read error:
diff --git a/src/connection.c b/src/connection.c
index b8be0a1..4e50f89 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -755,6 +755,10 @@ int make_proxy_line_v2(char *buf, int buf_len,
struct server *srv, struct connec
struct chunk *cn_trash;
#endif
+ if (remote->flags & CO_FL_ERROR) {
+ fprintf(stderr, "Remote has the error flag set\n");
+ }
+
if (buf_len < PP2_HEADER_LEN)
return 0;
memcpy(hdr->sig, pp2_signature, PP2_SIGNATURE_LEN);
diff --git a/src/ssl_sock.c b/src/ssl_sock.c
index 1017388..3b3249b 100644
--- a/src/ssl_sock.c
+++ b/src/ssl_sock.c
@@ -3555,6 +3555,7 @@ reneg_ok:
* avoiding the call if inappropriate. The function does not call the
* connection's polling update function, so the caller is responsible for this.
*/
+int doit;
static int ssl_sock_to_buf(struct connection *conn, struct buffer
*buf, int count)
{
int ret, done = 0;
@@ -3567,6 +3568,13 @@ static int ssl_sock_to_buf(struct connection
*conn, struct buffer *buf, int coun
/* a handshake was requested */
return 0;
+ doit++;
+ sleep(1);
+ if (doit == 2) {
+ fprintf(stderr, "simulating a bogus ssl error\n");
+ goto out_error;
+ }
+
/* let's realign the buffer to optimize I/O */
if (buffer_empty(buf))
buf->p = buf->data;
Without the proposed fix we see:
Starting program: /home/def/p/floss/haproxy/haproxy -dM -d -f ./test.cfg
[WARNING] 084/152625 (44137) : Setting tune.ssl.default-dh-param to
1024 by default, if your workload permits it you should set it to at
least 2048. Please set a value >= 1024 to make this warning disappear.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Using epoll() as the polling mechanism.
00000000:http-in.accept(0005)=0006 from [127.0.0.1:57140]
simulating a bogus ssl error
00000000:servers.clicls[0006:0006]
Remote has the error flag set
Remote has the error flag set
00000000:servers.closed[0006:0006]
00000001:http-in.accept(0005)=0006 from [127.0.0.1:57146]
With the proposed fix:
Starting program: /home/def/p/floss/haproxy/haproxy -dM -d -f ./test.cfg
[WARNING] 084/154206 (44507) : Setting tune.ssl.default-dh-param to
1024 by default, if your workload permits it you should set it to at
least 2048. Please set a value >= 1024 to make this warning disappear.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Using epoll() as the polling mechanism.
00000000:http-in.accept(0005)=0006 from [127.0.0.1:57186]
simulating a bogus ssl error
00000000:servers.clicls[0006:ffffffff]
00000000:servers.closed[0006:ffffffff]
Any thoughts? I realize just checking CO_FL_ERROR on remote->flags
would have been easier, but I wanted to make sure I wasn't missing
something else.
Also, I wasn't enterily clear whether this something that should take
'abortonclose' into account?
Thanks,
Frederik