Hi Willy, Sorry for the late response as I was out during the year end, and thanks once again for your review comments.
I explored your suggestion of "hard-coded periods", and have some problems: code complexity seems to be very high at updates (as well as retrievals possibly); and I may not be able to get accurate results. E.g. I have data for 1, 4, 16 seconds; and at 18 seconds, a request is made for retrieval of the last 16 seconds (or 1,4,16). At this time I have values for last 18 seconds not 16 seconds. I explored using timers to cascade (will not work as it may run into races with the setters, and also adds too much overhead) vs doing this synchronously when the event happens. Both are complicated and have the above issue of not able to get accurate information depending on when the request is made. To implement your suggestion of say histograms, the retrieval code can calculate the 4 values (1, 4, 16, and 64 seconds) by averaging across the correct intervals. In this case, the new CLI command is not required, and by default it prints all 4 values. Would this work in your opinion? Ack all your other suggestions, will incorporate those changes and re-send. Please let me know if this sounds reasonable. Thanks, - Krishna On Thu, Dec 22, 2016 at 4:23 PM, Willy Tarreau <[email protected]> wrote: > Hi Krishna, > > On Thu, Dec 22, 2016 at 09:41:49AM +0530, Krishna Kumar (Engineering) > wrote: > > We have found that the current mechanism of qtime, ctime, rtime, and > ttime > > based on last 1024 requests is not the most suitable to debug/visualize > > latency issues with servers, especially if they happen to last a very > short > > time. For live dashboards showing server timings, we found an additional > > last-'n' seconds metrics useful. The logs could also be parsed to derive > > these > > values, but suffers from delays at high volume, requiring higher > processing > > power and enabling logs. > > > > The 'last-n' seconds metrics per server/backend can be configured as > follows > > in the HAProxy configuration file: > > backend backend-1 > > stats-period 32 > > ... > > > > To retrieve these stats at the CLI (in addition to existing metrics), > run: > > echo show stat-duration time 3 | socat /var/run/admin.sock stdio > > > > These are also available on the GUI. > > > > The justification for this patch are: > > 1. Allows to capture spikes for a server during a short period. This > helps > > having dashboards that show server response times every few seconds > (e.g. > > every 1 second), so as to be able to chart it across timelines. > > 2. Be able to get an average across different time intervals, e.g. the > > configuration file may specify to save the last 32 seconds, but the > cli > > interface can request for average across any interval upto 32 seconds. > > E.g. > > the following command prints the existing metrics appended by the time > > based ones for the last 1 second: > > echo show stat-duration time 1 | socat /var/run/admin.sock stdio > > Running the following existing command appends the time-based metric > > values > > based on the time period configured in the configuration file per > > backend/server: > > echo show stat | socat /var/run/admin.sock stdio > > 3. Option per backend for configuring the server stat's time interval, > and. > > no API breakage to stats (new metrics are added at end of line). > > > > Please review, any feedback on the code/usability/extensibility is very > much > > appreciated. > > First, thanks for this work. I'm having several concerns and comments > however > about it. > > The first one is that the amount of storage is overkill if the output can > only emit an average over a few periods. I mean, the purpose of stats is > to emit what we know internally. Some people might want to see historgrams, > and while we have everything internally with your patch, it's not possible > to produce them. > > For this reason I think we should proceed differently and always emit these > stats over a few hard-coded periods. You proved that they don't take that > much space, and I think it would make sense probably to emit them over a > small series of power of 4 seconds : 1s, 4s, 16s, 64s. That's quite cheap > to store and easy to compute because it's not needed anymore to store all > individual values, you can cascade them while filling a bucket. > > And if you go down that route then there's no need anymore for adding yet > another setting in the configuration, it will be done by default. Another > point regarding this setting "stats-period" is that you currently do not > support the "defaults" section while I guess almost all users will want to > have it there. That's another reason for keeping it hard-coded. > > Now some general comments and guidance on the code itself : > > > diff --git a/include/proto/stats.h b/include/proto/stats.h > > index ac893b8..f33ceb1 100644 > > --- a/include/proto/stats.h > > +++ b/include/proto/stats.h > > @@ -95,7 +95,8 @@ int stats_fill_fe_stats(struct proxy *px, struct field > *stats, int len); > > int stats_fill_li_stats(struct proxy *px, struct listener *l, int flags, > > struct field *stats, int len); > > int stats_fill_sv_stats(struct proxy *px, struct server *sv, int flags, > > - struct field *stats, int len); > > + struct field *stats, int len, > > + struct stream_interface *si); > > Please avoid re-introducing the stream_interface in stats, we're trying > hard > to abstract it. From what I'm seeing you only need it to know if there's a > different period. Better pass the period directly in argument from the > caller. > > > diff --git a/include/types/counters.h b/include/types/counters.h > > index 06ce617..a33e01f 100644 > > --- a/include/types/counters.h > > +++ b/include/types/counters.h > > @@ -105,6 +105,41 @@ struct be_counters { > > } p; /* protocol-specific stats > */ > > }; > > > > +/* > > + * Struct time-stat-counters: > > + * > > + * Implement a last-n seconds based counters for backend/server's > q/c/r/t > > + * times. Total memory consumption on x86_64 to maintain all 4 event > time > > + * values per backend/server is: 16 + #seconds * 56 bytes. > > + * for 1 second: 72 bytes > > + * for 16 seconds: 912 bytes > > + * for 32 seconds: 1808 bytes > > + * for 64 second: 3600 bytes > > + */ > > +struct __tstat_counters { > > Avoid struct names using '__' as a prefix. In general we use such a prefix > for functions that people should use with extreme caution (eg: functions > which do not check for nulls or do not lock). Here there's no reason for > this. > > > + unsigned int timestamp; /* time when this entry was added > */ > > + long t_queue; /* sum of queue times */ > > + long t_connect; /* sum of times to connect to > server */ > > + long t_data; /* sum of times to get response */ > > + long t_close; /* sum of session times */ > > + int t_queue_counter; /* total number of 'queue' entries > */ > > + int t_connect_counter; /* total number of 'connect' > entries */ > > + int t_data_counter; /* total number of 'response' > entries */ > > + int t_close_counter; /* total number of 'session' > entries */ > > +}; > > + > > +struct tstat_counters { > > + int max_entries; /* max number of 'times' entries */ > > + struct __tstat_counters *times; /* Stats for every second */ > > +}; > > (...) > > diff --git a/include/types/proxy.h b/include/types/proxy.h > > index 3bf5a4d..97c5732 100644 > > --- a/include/types/proxy.h > > +++ b/include/types/proxy.h > > @@ -365,6 +365,8 @@ struct proxy { > > struct be_counters be_counters; /* backend statistics > counters */ > > struct fe_counters fe_counters; /* frontend statistics > counters */ > > > > + struct tstat_counters be_tstat_counters; /* last 'n' time counters > for backend */ > > + > > Given that you use it both in the backend and in the server, you should > probably move that directly to the struct be_counters. That will also get > rid of your intermediary tstats_counters struct. > > > diff --git a/src/cfgparse.c b/src/cfgparse.c > > index 3ed2c22..b212ee6 100644 > > --- a/src/cfgparse.c > > +++ b/src/cfgparse.c > > @@ -2551,6 +2551,50 @@ static void free_email_alert(struct proxy *p) > > p->email_alert.myhostname = NULL; > > } > > > > +/* > > + * Parse "stats-period" option in backend specifying how many > > + * seconds of stats to keep per server/backend. > > + */ > > +int parse_stats_period(const char *file, int linenum, char **args, > > + struct proxy *curproxy, struct proxy *defproxy) > > Better register a new keyword from stats.c than further increasing > cfgparse.c. > We're trying to deflate that huge beast by providing a more decentralized > parsing system. Please check "cfg_register_keywords()" for this. > > > diff --git a/src/haproxy.c b/src/haproxy.c > > index 239db80..57536ea 100644 > > --- a/src/haproxy.c > > +++ b/src/haproxy.c > > @@ -860,6 +860,12 @@ static void init(int argc, char **argv) > > } > > #endif > > > > + /* Enable time stats counters for those backends that asked for it > */ > > + if (enable_be_tstat_counters()) { > > + Alert("Unable to allocate backend time-counter stats.\n"); > > + exit(1); > > + } > > + > > At some point all proxies are started (start_proxies()). This should be > done at the same time instead of having it own loop. > > > +/* Initialize data structure for a single server/backend */ > > +int tstat_counters_init(struct tstat_counters *tstat, int num_entries) > > Please always indicate the expected return values. That saves a lot of > time during debugging sessions, otherwise we always have to read the whole > function to know what 1 or 0 means for example. Here I'm seeing that 0 is > OK and non-zero indicates a failure. > > > +{ > > + tstat->max_entries = num_entries; > > + if (!num_entries) { > > + tstat->times = NULL; > > + return 0; > > + } > > + > > + tstat->times = calloc(num_entries, sizeof(*tstat->times)); > > + return tstat->times == NULL; > > +} > > + > > +/* Wrapper function for tstat_counters_init() for all backends */ > > +int enable_be_tstat_counters(void) > > Same here. > > (...) > > diff --git a/src/stats.c b/src/stats.c > > index 976496e..571ec11 100644 > > --- a/src/stats.c > > +++ b/src/stats.c > > @@ -216,6 +216,10 @@ const char *stat_field_names[ST_F_TOTAL_FIELDS] = { > > [ST_F_INTERCEPTED] = "intercepted", > > [ST_F_DCON] = "dcon", > > [ST_F_DSES] = "dses", > > + [ST_F_QTIME_PERIOD] = "qt_period", > > + [ST_F_CTIME_PERIOD] = "ct_period", > > + [ST_F_RTIME_PERIOD] = "rt_period", > > + [ST_F_TTIME_PERIOD] = "tt_period", > > I think that if we go for the multiple periods output, it would make sense > to have the value directly in the name. For example, qt1, qt4, qt16, qt64 > etc... > > > }; > > > > /* one line of info */ > > @@ -385,7 +389,7 @@ static int stats_dump_fields_typed(struct chunk > *out, const struct field *stats) > > * reserved for the checkbox is ST_SHOWADMIN is set in <flags>. Some > extra info > > * are provided if ST_SHLGNDS is present in <flags>. > > */ > > -static int stats_dump_fields_html(struct chunk *out, const struct field > *stats, unsigned int flags) > > +static int stats_dump_fields_html(struct chunk *out, const struct field > *stats, unsigned int flags, struct proxy *px) > > Please don't add the proxy here. We've tried hard to remove any dependency > on internal information in the dump code so that all outputs can contain > all info. That's what will make it possible to have some JSON output for > example. The dump functions are just "output drivers" which translate the > contents of the stats table. Here the only reason why you need this is to > have the duration. Just add another metric in the stats which is the > duration > and an external aggregator will even be able to use it and report it. But > as > you see this is also eliminated if you produce a few values over different > periods. > > (...) > > > +/* > > + * Time stat entries have one of the following values: > > + * - 0 (never used, or was set with a value of 0. May or may not be > valid) > > + * - -1 (time was not relevant at update, this entry is not used to > > + * calculate the average) > > + * - non-zero (used in the past, may be valid or invalid). > > + * To determine whether the value is useful or not, we compare current > > + * time to saved time. We cannot stop once we reach a time outside of > the > > + * interval, as intermediate entries may have old values and not be > updated > > + * if no updates took place at those times. This means going through the > > + * entire array, but it is better to do extra work at (rare) retrieval > times > > + * (control plane), rather than updating unused entries at run time > (data > > + * plane). This leads to fast updates and slower retrievals. > > + */ > > +static void get_tstat_counters(struct tstat_counters *tstat, > > + struct stream_interface *si, > > + unsigned int now, struct field *qtimep, > > + struct field *ctimep, struct field *rtimep, > > + struct field *ttimep) > > +{ > > + int i; > > + int64_t qtime = 0, ctime = 0, rtime = 0, ttime = 0; > > + int64_t qcounter = 0, ccounter = 0, rcounter = 0, tcounter = 0; > > + int duration = tstat->max_entries; > > + > > + /* > > + * Return zero for those backends that did not ask for time based > > + * stats. > > + */ > > + if (!tstat->times) { > > + *qtimep = mkf_u32(FN_AVG, 0); > > + *ctimep = mkf_u32(FN_AVG, 0); > > + *rtimep = mkf_u32(FN_AVG, 0); > > + *ttimep = mkf_u32(FN_AVG, 0); > > + return; > > + } > > + > > + if (si && si->end) { > > + /* > > + * If user via CLI specified a specific time period over > which > > + * to report stats, then retrieve that value instead of the > > + * default value specified in the haproxy configuration > file. > > + */ > > + struct appctx *appctx = objt_appctx(si->end); > > + > > + if (appctx && (appctx->ctx.stats.flags & STAT_DURATION)) > > + duration = MAX(MIN(duration, > > + appctx->ctx.stats.duration), 0); > > + } > > + > > + /* Sum all entries to calculate time window based average */ > > + for (i = 0; i < tstat->max_entries; i++) { > > + if (!within_period(tstat->times[i].timestamp, now, > duration)) { > > + /* Outside the requested period, but cannot break > out */ > > + continue; > > + } > > + > > + /* Valid time period - increment counts/average */ > > + qtime += tstat->times[i].t_queue; > > + ctime += tstat->times[i].t_connect; > > + rtime += tstat->times[i].t_data; > > + ttime += tstat->times[i].t_close; > > + > > + qcounter += tstat->times[i].t_queue_counter; > > + ccounter += tstat->times[i].t_connect_counter; > > + rcounter += tstat->times[i].t_data_counter; > > + tcounter += tstat->times[i].t_close_counter; > > + } > > + > > + /* Avoid divide-by-zero errors */ > > + if (!qcounter) > > + qcounter = 1; > > + if (!ccounter) > > + ccounter = 1; > > + if (!rcounter) > > + rcounter = 1; > > + if (!tcounter) > > + tcounter = 1; > > + > > + /* Convert average times to microsecond */ > > + *qtimep = mkf_u32(FN_AVG, qtime / qcounter); > > + *ctimep = mkf_u32(FN_AVG, ctime / ccounter); > > + *rtimep = mkf_u32(FN_AVG, rtime / rcounter); > > + *ttimep = mkf_u32(FN_AVG, ttime / tcounter); > > Above I'd proceed differently. Since the stats supports the "empty" type > (which is the default), I'd do it this way instead, which has both the > benefit of avoiding the divide by zero and not to emit any value when > there's none (instead of emitting zero) : > > /* Convert average times to microsecond */ > if (qcounter) > *qtimep = mkf_u32(FN_AVG, qtime / qcounter); > if (ccounter) > *ctimep = mkf_u32(FN_AVG, ctime / ccounter); > if (rcounter) > *rtimep = mkf_u32(FN_AVG, rtime / rcounter); > if (tcounter) > *ttimep = mkf_u32(FN_AVG, ttime / tcounter); > > > +/* Update time-stat counter values if enabled for this backend */ > > +void set_tstat_counters(struct tstat_counters *tstat, > > + unsigned int now, int t_queue, > > + int t_connect, int t_data, int t_close) > > +{ > > + if (tstat->times) { > > + unsigned int slot = now % tstat->max_entries; > > + > > + /* > > + * Optimize for heavy load when many requests are made in > > + * the same second. During very low load, the following > > + * 'likely' will become false more often. > > + */ > > + if (likely(now == tstat->times[slot].timestamp)) { > (...) > > Here you can make the code slightly simpler by first checking if the period > is different to reinitialize the slot to zero using a simple memset(), and > then go to the usual increment. That removes a significant amount of logic. > > Cheers, > Willy >

