Hi Tim,
On Sun, May 20, 2018 at 05:55:06PM +0200, Tim Duesterhus wrote:
> Willy,
>
> attached is a first attempt at a patch that adds logging (without any rate
> limiting). I have a few questions regarding the whole counters and logging
> infrastructure:
>
> 1. I noticed that there is ha_warning(...) and send_log(p, LOG_WARNING, ...),
> usually both are used in the same place. What's the difference between
> them? Are the log lines in my patch going into the correct place?
ha_warning() is used to emit warnings on stderr during startup, mainly
when parsing the config. Example of such warnings are "missing timeouts
for proxy foobar". send_log() is used at run time to either report activity
(regular logs) or send notification about noticeable events like servers
going up/down, or risks for the reliability due to external events (no more
FDs) or issues like this one.
Historically we've accumulated a few ha_warning() in runtime code paths,
but each time you see this it's an error. Sometimes due to a thinko,
sometimes due to code being moved or reused.
> 2. How do you prefer the rate limiting of the log lines?
> - Is there an existing "flag" for rate limited logs?
Not that I'm aware of.
> - Where should the information about the rate limited logging be stored
> (in a static variable?)?
At a few places (which ones, I don't remember) we have a local static
variable to report certain elements only once. It could be one option.
But the problem is that if the first event was missed, the new ones
will be missed as well.
> - Log only every Xth failed call to http_header_add_tail2?
It could be the easiest and most convenient solution for the mid term
I think. E.g. just log once every 256 using "if (!(counter++ & 255)) ..."
> - By time? This would require an additional call to gettimeofday, no?
It's not welcome there. We could rely on a freq_counter though, but I'm
not sure it's worth the extra pain (enthough it's likely that over the
long term that's what it will evolve to). If you prefer to do it anyway,
it will look roughly like this :
#define LIMIT 10 // burst of 10 messages max
#define PERIOD 10000 // over 10 seconds
static struct freq_ctr_period warn_limiter;
static unsigned int skipped;
if (freq_ctr_remain_period(&warn_limiter, PERIOD, LIMIT, 0)) {
update_freq_ctr_period(&warn_limiter, PERIOD, 1);
send_log("log: blah (skipped %d messages)", skipped);
skipped=0;
} else {
skipped++;
}
> - Should the code have separate rate limiting per header (I guess no,
> because the only meaningful thing a administrator can do is increase
> tune.maxrewrite and that fixes all headers at once)?
It would really be overkill and would provide little to no value. As you
say, what matters is to alert against too low maxrewrite. However using a
dedicated log function to warn about such problems and maintain its own
counter definitely makes sense to me.
> 3. Regarding the counters I noticed frontends don't really seem to use
> error counters yet. Do I need to add a counter to both
> struct fe_counters as well as struct be_counters and increment depending
> on whether the (add|set)-header line is in a backend or frontend?
I'm not sure what you mean about frontends not using error counters,
because the request parsing issues are reported there, just like some of
the errors causing an error to be reported. The frontend currently tracks
failed_req, denied_{conn,sess,req,resp}, and {cli,srv}_aborts.
> diff --git a/src/proto_http.c b/src/proto_http.c
> index 3adb54f2..9739fc7a 100644
> --- a/src/proto_http.c
> +++ b/src/proto_http.c
> @@ -2631,7 +2631,11 @@ resume_execution:
> }
> }
>
> - http_header_add_tail2(&txn->req, &txn->hdr_idx,
> replace->str, replace->len);
> + if (http_header_add_tail2(&txn->req, &txn->hdr_idx,
> replace->str, replace->len) < 0) {
> + replace->str[rule->arg.hdr_add.name_len] = 0;
> + ha_warning("Proxy %s failed to add or set the
> request header '%s'. You might need to increase tune.maxrewrite.\n", px->id,
> replace->str);
> + send_log(px, LOG_WARNING, "Proxy %s failed to
> add or set the request header '%s'. You might need to increase
> tune.maxrewrite.", px->id, replace->str);
> + }
In my opinion it could be useful to add "for request #%u",s->uniq_id.
This ID can be retrieved using "%rt" in the logs, so for those who need
to understand what request broke in order to know a bit more about it,
it will definitely help.
thanks,
Willy