Re: 2x filter + keep-alive regressions (1.7 affected)

Christopher Faulet Thu, 06 Jul 2017 14:01:56 -0700

Le 30/06/2017 à 22:09, Lukas Tribus a écrit :

Hello Christopher, William, Willy, et all!



Matt McDonagh reported a regression on discourse [1] in 1.7.6, that
causes haproxy to ignore "timeout http-keep-alive" when going through
filters (aka compression is enabled) and also causes logging to be
delayed.

Because timeouts are ignored and wrong timeouts strike, performance
issues are likely (as we don't close idle sessions as expected, we hit
maxconn sooner, etc).

The root cause are actually 2 different patches in 1.7 stable. This is
getting a bit complex here, because in 1.7 both patches cause
regressions, while in 1.8 only one of those 2 patches actually cause an
issue.


We are talking about:
2b553de BUG/MINOR: filters: Don't force the stream's wakeup when we wait in 
flt_end_analyze
c0c672a BUG/MINOR: http: Fix conditions to clean up a txn and to handle the 
next request


Repro config (I used netcat for a HTTP/1.1 request):

global
  maxconn 100
  log 10.0.0.4 syslog debug
defaults
  log global
  mode http
  option httplog
  option http-server-close
  #option http-keep-alive
  timeout connect 5s
  timeout client 10s
  timeout server 20s
  timeout http-keep-alive 1s
frontend myfrontend
  bind :80
  default_backend mybackend
backend mybackend
  compression algo gzip
  server www www.lan.ltri.eu:80


In v1.7.4 (everything fine):
"mode http-server-close":
   as expected (logs immediately, closes idle keep-alive at
   "timeout http-keep-alive")

"mode http-keep-alive":
   as expected (logs immediately, closes idle keep-alive at
   "timeout http-keep-alive")


In v1.7.5 (due to b0c3fd3b BUG/MINOR: filters: Don't force the stream's wakeup 
when we wait in flt_end_analyze):
"mode http-server-close":
   log stalled and keep-alive idle close at "timeout connect";
   ignores "timeout http-keep-alive"

"mode http-keep-alive":
   log stalled and keep-alive idle close at "timeout connect + client";
   ignores "timeout http-keep-alive"


In v1.7.6 (due to 73d071e BUG/MINOR: http: Fix conditions to clean up a txn and 
to handle the next request):
"mode http-server-close":
   log stalled and keep-alive idle close at "timeout client";
   ignores "timeout http-keep-alive"

"mode http-keep-alive":
   log stalled for "timeout server"; keep-alive stalled for
   "timeout server (then log) + timeout http-keep-alive"



As for 1.8:
Patches have been applied as per Author-Date, while in 1.7 the
patches have been applied in reverse order.

"BUG/MINOR: http: Fix conditions to clean up a txn and to handle the next 
request"
  -> does not cause issues in 1.8

"BUG/MINOR: filters: Don't force the stream's wakeup when we wait in 
flt_end_analyze"
  -> has the same effect as in 1.7(.6).


The 1.6/1.5 situation is unkown at this time.


Let me know if something is unclear (I'm almost certain it is) ...



cheers,
lukas


[1] http://discourse.haproxy.org/t/keep-alive-behaviour-change-since-1-7-6/1390


Hi guys,

Attached patches should fix this bug. The real fix is in the last one.But all the 3 must be backported in 1.7. I made tests with the Lukasconfig and http-keep-alive timeout is now respected. But because filtersseems to generate many bugs these last weeks, a double check is welcomed :)


Many thanks Lukas.

--
Christopher Faulet

>From 6b6d17a120b1af9500d530403963adb4cf43d59b Mon Sep 17 00:00:00 2001
From: Christopher Faulet <cfau...@haproxy.com>
Date: Thu, 6 Jul 2017 15:49:30 +0200
Subject: [PATCH 1/3] BUG/MINOR: stream: Don't forget to remove CF_WAKE_ONCE
 flag on response channel

This flag can be set on a channel to pretend there is activity on it. This is a
way to wake-up the corresponding stream and evaluate stream analyzers on the
channel. It is correctly handled on both channels but removed only on the
request channel.

This patch is flagged as a bug but for now, CF_WAKE_ONCE is never set on the
response channel.
---
 src/stream.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/stream.c b/src/stream.c
index 4e34f38..1aa5475 100644
--- a/src/stream.c
+++ b/src/stream.c
@@ -1881,6 +1881,7 @@ struct task *process_stream(struct task *t)
 
 		rp_cons_last = si_f->state;
 		rp_prod_last = si_b->state;
+		res->flags &= ~CF_WAKE_ONCE;
 		rpf_last = res->flags;
 
 		if ((res->flags ^ flags) & CF_MASK_STATIC)
-- 
2.9.4

>From 8868eedf682b978082c46179faffa49919735d61 Mon Sep 17 00:00:00 2001
From: Christopher Faulet <cfau...@haproxy.com>
Date: Thu, 6 Jul 2017 15:51:35 +0200
Subject: [PATCH 2/3] BUG/MINOR: http: Don't reset the transaction if there are
 still data to send

To reset an HTTP transaction, we need to be sure all data were sent, for the
request and the response. There are tests on request and response buffers for
that in http_resync_states function. But the return code was wrong. We must
return 0 to wait.

This patch must be backported in 1.7
---
 src/proto_http.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/src/proto_http.c b/src/proto_http.c
index 7141833..e0ed2da 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -5654,16 +5654,18 @@ int http_resync_states(struct stream *s)
 		 * and the response buffer must realigned
 		 * (realign is done is http_end_txn_clean_session).
 		 */
-		if (s->req.buf->o)
+		if (s->req.buf->o) {
 			s->req.flags |= CF_WAKE_WRITE;
-		else if (s->res.buf->o)
+			return 0;
+		}
+		else if (s->res.buf->o) {
 			s->res.flags |= CF_WAKE_WRITE;
-		else {
-			s->req.analysers = AN_REQ_FLT_END;
-			s->res.analysers = AN_RES_FLT_END;
-			txn->flags |= TX_WAIT_CLEANUP;
-			return 1;
+			return 0;
 		}
+		s->req.analysers = AN_REQ_FLT_END;
+		s->res.analysers = AN_RES_FLT_END;
+		txn->flags |= TX_WAIT_CLEANUP;
+		return 1;
 	}
 
 	return txn->req.msg_state != old_req_state ||
-- 
2.9.4

>From ddf318e83f2a0d52636938cad84ad20939f067a4 Mon Sep 17 00:00:00 2001
From: Christopher Faulet <cfau...@haproxy.com>
Date: Thu, 6 Jul 2017 15:53:02 +0200
Subject: [PATCH 3/3] BUG/MEDIUM: filters: Be sure to call flt_end_analyze for
 both channels

In the commit 2b553de5 ("BUG/MINOR: filters: Don't force the stream's wakeup
when we wait in flt_end_analyze"), we removed a task_wakeup in flt_end_analyze
to no consume too much CPU by looping in certain circumstances.

But this fix was too drastic. For Keep-Alive transactions, flt_end_analyze is
often called only for the response. Then the stream is paused until a timeout is
hitted or the next request is received. We need first let a chance to both
channels to call flt_end_analyze function. Then if a filter need to wait here,
it is its responsibility to wake up the stream when needed. To fix the bug, and
thanks to previous commits, we set the flag CF_WAKE_ONCE on channels to pretend
there is an activity. On the current channel, the flag will be removed without
any effect, but for the other side the analyzer will be called immediatly.

Thanks for Lukas Tribus for his detailed analysis of the bug.

This patch must be backported in 1.7 with the 2 previous ones:

  * 8868eed ("BUG/MINOR: http: Don't reset the transaction if there are still data to send")
  * 6b6d17a ("BUG/MINOR: stream: Don't forget to remove CF_WAKE_ONCE flag on response channel")
---
 src/filters.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/filters.c b/src/filters.c
index 68bf7ee..0d0e5e8 100644
--- a/src/filters.c
+++ b/src/filters.c
@@ -843,9 +843,21 @@ flt_end_analyze(struct stream *s, struct channel *chn, unsigned int an_bit)
 	/* We don't remove yet this analyzer because we need to synchronize the
 	 * both channels. So here, we just remove the flag CF_FLT_ANALYZE. */
 	ret = handle_analyzer_result(s, chn, 0, ret);
-	if (ret)
+	if (ret) {
 		chn->flags &= ~CF_FLT_ANALYZE;

+		/* Pretend there is an activity on both channels. Flag on the
+		 * current one will be automatically removed, so only the other
+		 * one will remain. This is a way to be sure that
+		 * 'channel_end_analyze' callback will have a chance to be
+		 * called at least once for the other side to finish the current
+		 * processing. Of course, this is the filter responsiblity to
+		 * wakeup the stream if it choose to loop on this callback. */
+		s->req.flags |= CF_WAKE_ONCE;
+		s->res.flags |= CF_WAKE_ONCE;
+	}
+
+
  sync:
 	/* Now we can check if filters have finished their work on the both
 	 * channels */
-- 
2.9.4

Re: 2x filter + keep-alive regressions (1.7 affected)

Reply via email to