Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

Willy Tarreau Thu, 22 Dec 2016 02:54:12 -0800

Hi Krishna,

On Thu, Dec 22, 2016 at 09:41:49AM +0530, Krishna Kumar (Engineering) wrote:
> We have found that the current mechanism of qtime, ctime, rtime, and ttime
> based on last 1024 requests is not the most suitable to debug/visualize
> latency issues with servers, especially if they happen to last a very short
> time. For live dashboards showing server timings, we found an additional
> last-'n' seconds metrics useful. The logs could also be parsed to derive
> these
> values, but suffers from delays at high volume, requiring higher processing
> power and enabling logs.
> 
> The 'last-n' seconds metrics per server/backend can be configured as follows
> in the HAProxy configuration file:
> backend backend-1
> stats-period 32
> ...
> 
> To retrieve these stats at the CLI (in addition to existing metrics), run:
> echo show stat-duration time 3 | socat /var/run/admin.sock stdio
> 
> These are also available on the GUI.
> 
> The justification for this patch are:
> 1. Allows to capture spikes for a server during a short period. This helps
>    having dashboards that show server response times every few seconds (e.g.
>    every 1 second), so as to be able to chart it across timelines.
> 2. Be able to get an average across different time intervals, e.g.  the
>    configuration file may specify to save the last 32 seconds, but the cli
>    interface can request for average across any interval upto 32 seconds.
> E.g.
>    the following command prints the existing metrics appended by the time
>    based ones for the last 1 second:
> echo show stat-duration time 1 | socat /var/run/admin.sock stdio
>    Running the following existing command appends the time-based metric
> values
>    based on the time period configured in the configuration file per
>    backend/server:
> echo show stat | socat /var/run/admin.sock stdio
> 3. Option per backend for configuring the server stat's time interval, and.
>    no API breakage to stats (new metrics are added at end of line).
> 
> Please review, any feedback on the code/usability/extensibility is very much
> appreciated.


First, thanks for this work. I'm having several concerns and comments however
about it.

The first one is that the amount of storage is overkill if the output can
only emit an average over a few periods. I mean, the purpose of stats is
to emit what we know internally. Some people might want to see historgrams,
and while we have everything internally with your patch, it's not possible
to produce them.

For this reason I think we should proceed differently and always emit these
stats over a few hard-coded periods. You proved that they don't take that
much space, and I think it would make sense probably to emit them over a
small series of power of 4 seconds : 1s, 4s, 16s, 64s. That's quite cheap
to store and easy to compute because it's not needed anymore to store all
individual values, you can cascade them while filling a bucket.

And if you go down that route then there's no need anymore for adding yet
another setting in the configuration, it will be done by default. Another
point regarding this setting "stats-period" is that you currently do not
support the "defaults" section while I guess almost all users will want to
have it there. That's another reason for keeping it hard-coded.

Now some general comments and guidance on the code itself :

> diff --git a/include/proto/stats.h b/include/proto/stats.h
> index ac893b8..f33ceb1 100644
> --- a/include/proto/stats.h
> +++ b/include/proto/stats.h
> @@ -95,7 +95,8 @@ int stats_fill_fe_stats(struct proxy *px, struct field 
> *stats, int len);
>  int stats_fill_li_stats(struct proxy *px, struct listener *l, int flags,
>                          struct field *stats, int len);
>  int stats_fill_sv_stats(struct proxy *px, struct server *sv, int flags,
> -                        struct field *stats, int len);
> +                        struct field *stats, int len,
> +                        struct stream_interface *si);

Please avoid re-introducing the stream_interface in stats, we're trying hard
to abstract it. From what I'm seeing you only need it to know if there's a
different period. Better pass the period directly in argument from the caller.

> diff --git a/include/types/counters.h b/include/types/counters.h
> index 06ce617..a33e01f 100644
> --- a/include/types/counters.h
> +++ b/include/types/counters.h
> @@ -105,6 +105,41 @@ struct be_counters {
>       } p;                                    /* protocol-specific stats */
>  };
>  
> +/*
> + * Struct time-stat-counters:
> + *
> + * Implement a last-n seconds based counters for backend/server's q/c/r/t
> + * times. Total memory consumption on x86_64 to maintain all 4 event time
> + * values per backend/server is: 16 + #seconds * 56 bytes.
> + *   for 1 second:   72 bytes
> + *   for 16 seconds: 912 bytes
> + *   for 32 seconds: 1808 bytes
> + *   for 64 second:  3600 bytes
> + */
> +struct __tstat_counters {

Avoid struct names using '__' as a prefix. In general we use such a prefix
for functions that people should use with extreme caution (eg: functions
which do not check for nulls or do not lock). Here there's no reason for
this.

> +     unsigned int timestamp;         /* time when this entry was added */
> +     long t_queue;                   /* sum of queue times */
> +     long t_connect;                 /* sum of times to connect to server */
> +     long t_data;                    /* sum of times to get response */
> +     long t_close;                   /* sum of session times */
> +     int t_queue_counter;            /* total number of 'queue' entries */
> +     int t_connect_counter;          /* total number of 'connect' entries */
> +     int t_data_counter;             /* total number of 'response' entries */
> +     int t_close_counter;            /* total number of 'session' entries */
> +};
> +
> +struct tstat_counters {
> +     int max_entries;                /* max number of 'times' entries */
> +     struct __tstat_counters *times; /* Stats for every second */
> +};

(...)
> diff --git a/include/types/proxy.h b/include/types/proxy.h
> index 3bf5a4d..97c5732 100644
> --- a/include/types/proxy.h
> +++ b/include/types/proxy.h
> @@ -365,6 +365,8 @@ struct proxy {
>       struct be_counters be_counters;         /* backend statistics counters 
> */
>       struct fe_counters fe_counters;         /* frontend statistics counters 
> */
>  
> +     struct tstat_counters be_tstat_counters; /* last 'n' time counters for 
> backend */
> +

Given that you use it both in the backend and in the server, you should
probably move that directly to the struct be_counters. That will also get
rid of your intermediary tstats_counters struct.

> diff --git a/src/cfgparse.c b/src/cfgparse.c
> index 3ed2c22..b212ee6 100644
> --- a/src/cfgparse.c
> +++ b/src/cfgparse.c
> @@ -2551,6 +2551,50 @@ static void free_email_alert(struct proxy *p)
>       p->email_alert.myhostname = NULL;
>  }
>  
> +/*
> + * Parse "stats-period" option in backend specifying how many
> + * seconds of stats to keep per server/backend.
> + */
> +int parse_stats_period(const char *file, int linenum, char **args,
> +                    struct proxy *curproxy, struct proxy *defproxy)

Better register a new keyword from stats.c than further increasing cfgparse.c.
We're trying to deflate that huge beast by providing a more decentralized
parsing system. Please check "cfg_register_keywords()" for this.

> diff --git a/src/haproxy.c b/src/haproxy.c
> index 239db80..57536ea 100644
> --- a/src/haproxy.c
> +++ b/src/haproxy.c
> @@ -860,6 +860,12 @@ static void init(int argc, char **argv)
>          }
>  #endif
>  
> +     /* Enable time stats counters for those backends that asked for it */
> +     if (enable_be_tstat_counters()) {
> +             Alert("Unable to allocate backend time-counter stats.\n");
> +             exit(1);
> +     }
> +

At some point all proxies are started (start_proxies()). This should be
done at the same time instead of having it own loop.

> +/* Initialize data structure for a single server/backend */
> +int tstat_counters_init(struct tstat_counters *tstat, int num_entries)

Please always indicate the expected return values. That saves a lot of
time during debugging sessions, otherwise we always have to read the whole
function to know what 1 or 0 means for example. Here I'm seeing that 0 is
OK and non-zero indicates a failure.

> +{
> +     tstat->max_entries = num_entries;
> +     if (!num_entries) {
> +             tstat->times = NULL;
> +             return 0;
> +     }
> +
> +     tstat->times = calloc(num_entries, sizeof(*tstat->times));
> +     return tstat->times == NULL;
> +}
> +
> +/* Wrapper function for tstat_counters_init() for all backends */
> +int enable_be_tstat_counters(void)

Same here.

(...)
> diff --git a/src/stats.c b/src/stats.c
> index 976496e..571ec11 100644
> --- a/src/stats.c
> +++ b/src/stats.c
> @@ -216,6 +216,10 @@ const char *stat_field_names[ST_F_TOTAL_FIELDS] = {
>       [ST_F_INTERCEPTED]    = "intercepted",
>       [ST_F_DCON]           = "dcon",
>       [ST_F_DSES]           = "dses",
> +     [ST_F_QTIME_PERIOD]   = "qt_period",
> +     [ST_F_CTIME_PERIOD]   = "ct_period",
> +     [ST_F_RTIME_PERIOD]   = "rt_period",
> +     [ST_F_TTIME_PERIOD]   = "tt_period",

I think that if we go for the multiple periods output, it would make sense
to have the value directly in the name. For example, qt1, qt4, qt16, qt64
etc...

>  };
>  
>  /* one line of info */
> @@ -385,7 +389,7 @@ static int stats_dump_fields_typed(struct chunk *out, 
> const struct field *stats)
>   * reserved for the checkbox is ST_SHOWADMIN is set in <flags>. Some extra 
> info
>   * are provided if ST_SHLGNDS is present in <flags>.
>   */
> -static int stats_dump_fields_html(struct chunk *out, const struct field 
> *stats, unsigned int flags)
> +static int stats_dump_fields_html(struct chunk *out, const struct field 
> *stats, unsigned int flags, struct proxy *px)

Please don't add the proxy here. We've tried hard to remove any dependency
on internal information in the dump code so that all outputs can contain
all info. That's what will make it possible to have some JSON output for
example. The dump functions are just "output drivers" which translate the
contents of the stats table. Here the only reason why you need this is to
have the duration. Just add another metric in the stats which is the duration
and an external aggregator will even be able to use it and report it. But as
you see this is also eliminated if you produce a few values over different
periods.

(...)

> +/*
> + * Time stat entries have one of the following values:
> + *   - 0 (never used, or was set with a value of 0. May or may not be valid)
> + *   - -1 (time was not relevant at update, this entry is not used to
> + *           calculate the average)
> + *   - non-zero (used in the past, may be valid or invalid).
> + * To determine whether the value is useful or not, we compare current
> + * time to saved time. We cannot stop once we reach a time outside of the
> + * interval, as intermediate entries may have old values and not be updated
> + * if no updates took place at those times. This means going through the
> + * entire array, but it is better to do extra work at (rare) retrieval times
> + * (control plane), rather than updating unused entries at run time (data
> + * plane). This leads to fast updates and slower retrievals.
> + */
> +static void get_tstat_counters(struct tstat_counters *tstat,
> +                            struct stream_interface *si,
> +                            unsigned int now, struct field *qtimep,
> +                            struct field *ctimep, struct field *rtimep,
> +                            struct field *ttimep)
> +{
> +     int i;
> +     int64_t qtime = 0, ctime = 0, rtime = 0, ttime = 0;
> +     int64_t qcounter = 0, ccounter = 0, rcounter = 0, tcounter = 0;
> +     int duration = tstat->max_entries;
> +
> +     /*
> +      * Return zero for those backends that did not ask for time based
> +      * stats.
> +      */
> +     if (!tstat->times) {
> +             *qtimep = mkf_u32(FN_AVG, 0);
> +             *ctimep = mkf_u32(FN_AVG, 0);
> +             *rtimep = mkf_u32(FN_AVG, 0);
> +             *ttimep = mkf_u32(FN_AVG, 0);
> +             return;
> +     }
> +
> +     if (si && si->end) {
> +             /*
> +              * If user via CLI specified a specific time period over which
> +              * to report stats, then retrieve that value instead of the
> +              * default value specified in the haproxy configuration file.
> +              */
> +             struct appctx *appctx = objt_appctx(si->end);
> +
> +             if (appctx && (appctx->ctx.stats.flags & STAT_DURATION))
> +                     duration = MAX(MIN(duration,
> +                                        appctx->ctx.stats.duration), 0);
> +     }
> +
> +     /* Sum all entries to calculate time window based average */
> +     for (i = 0; i < tstat->max_entries; i++) {
> +             if (!within_period(tstat->times[i].timestamp, now, duration)) {
> +                     /* Outside the requested period, but cannot break out */
> +                     continue;
> +             }
> +
> +             /* Valid time period - increment counts/average */
> +             qtime += tstat->times[i].t_queue;
> +             ctime += tstat->times[i].t_connect;
> +             rtime += tstat->times[i].t_data;
> +             ttime += tstat->times[i].t_close;
> +
> +             qcounter += tstat->times[i].t_queue_counter;
> +             ccounter += tstat->times[i].t_connect_counter;
> +             rcounter += tstat->times[i].t_data_counter;
> +             tcounter += tstat->times[i].t_close_counter;
> +     }
> +
> +     /* Avoid divide-by-zero errors */
> +     if (!qcounter)
> +             qcounter = 1;
> +     if (!ccounter)
> +             ccounter = 1;
> +     if (!rcounter)
> +             rcounter = 1;
> +     if (!tcounter)
> +             tcounter = 1;
> +
> +     /* Convert average times to microsecond */
> +     *qtimep = mkf_u32(FN_AVG, qtime / qcounter);
> +     *ctimep = mkf_u32(FN_AVG, ctime / ccounter);
> +     *rtimep = mkf_u32(FN_AVG, rtime / rcounter);
> +     *ttimep = mkf_u32(FN_AVG, ttime / tcounter);

Above I'd proceed differently. Since the stats supports the "empty" type
(which is the default), I'd do it this way instead, which has both the
benefit of avoiding the divide by zero and not to emit any value when
there's none (instead of emitting zero) :

        /* Convert average times to microsecond */
        if (qcounter)
                *qtimep = mkf_u32(FN_AVG, qtime / qcounter);
        if (ccounter)
                *ctimep = mkf_u32(FN_AVG, ctime / ccounter);
        if (rcounter)
                *rtimep = mkf_u32(FN_AVG, rtime / rcounter);
        if (tcounter)
                *ttimep = mkf_u32(FN_AVG, ttime / tcounter);

> +/* Update time-stat counter values if enabled for this backend */
> +void set_tstat_counters(struct tstat_counters *tstat,
> +                     unsigned int now, int t_queue,
> +                     int t_connect, int t_data, int t_close)
> +{
> +     if (tstat->times) {
> +             unsigned int slot = now % tstat->max_entries;
> +
> +             /*
> +              * Optimize for heavy load when many requests are made in
> +              * the same second. During very low load, the following
> +              * 'likely' will become false more often.
> +              */
> +             if (likely(now == tstat->times[slot].timestamp)) {
(...)

Here you can make the code slightly simpler by first checking if the period
is different to reinitialize the slot to zero using a simple memset(), and
then go to the usual increment. That removes a significant amount of logic.

Cheers,
Willy

Re: [PATCH] MEDIUM/RFC: Implement time-based server latency metrics

Reply via email to