Re: Providing near-time averaged monitoring data for mod_systemd and mod_status

Joe Orton Mon, 27 Apr 2020 06:57:49 -0700

On Sat, Apr 25, 2020 at 08:10:40PM +0200, Rainer Jung wrote:
> Patch available at
> 
> home.apache.org/~rjung/patches/httpd-trunk-mon-snaps-v1_2.patch


Very nice!  +1 from me.

Does the times_per_thread logic still make any sense?  It's always been 
wrong for Linux AFAICT so maybe can just be dropped.

A minor nit, I think the snap_index should be read/written using 
apr_atomic_t since it's going to be accessed concurrently across 
threads?

Regards, Joe



> Some notes:
> 
> - in order to use the data from mod_systemd (monitor hook), which runs in
> the maim process, and also from mod_status, so from child processes, it
> needs to sit in shared memory. Since most of the data is tightly coupled to
> the scoreboard, I added the new cached data to the global part of the
> scoreboard.
> 
> - it was then IMHO a good fit to move the existing ap_get_sload(ap_sload_t
> *ld) from server/util.c to server/scoreboard.c.
> 
> - ap_sload_t is used as a collection of data which can be used to generate
> averaged data from a pair of ap_sload_t structs. It was extended to also
> contain the total request duration plus total usr and sys CPU and the
> timestamp when the data was taken. So it should now contain all data from
> which mod-systemd and mod_status currently like to display derived averaged
> values.
> 
> - busy and idle in ap_sload_t have been changed from int to float, because
> they were already only used a percentages. IMHO it doesn't make sense to
> express idle and busy as percentages based on a total of busy plus idle,
> because that sum is not a meaningful total. The server can grow by creating
> new processes and suddenly your busy percentage might shrink, although the
> absolute number of busy threads increases. That's counter intuitive. So I
> added the unused process slots to the total count, and we have three
> counters that sum up to this total, busy, idle and dead. We need a better
> name than "dead" for these unused process slots that might get used later.
> "unused" is to close to "idle" and I don't have a good name yet.
> 
> - the new ap_mon_snap_t contains a pointer to an ap_sload_t, six averaged
> values generated from two ap_sload_t and a member that conatins the time
> delta between those two ap_sload_t. The ap_sload_t referenced by
> ap_mon_snap_t contains the data at time t1. During the next monitor run, new
> t1 data will be collected and the previous t1 data will be used as t0 data
> to generate the new averages.
> 
> - the scoreboard contains two ap_mon_snap_t (plus the two ap_sload_t
> referenced by them) to be able to update one of them without breaking access
> by consumers to the other one. After the update the roles get switched.
> 
> - both structs, ap_sload_t and ap_mon_snap_t are declared in httpd.h,
> because ap_sload_t already had been there. t might be a better fit to move
> them to scoreboard.h, but I'm not sure about that and I don't know if that
> would be allowed by the compatibility rules.
> 
> - also in httpd.h there are now three new function declarations. Two only
> used by server/core.c as hook functions:
> 
>   int ap_scoreboard_monitor(apr_pool_t *p, server_rec *s);
>   void ap_scoreboard_child_init(apr_pool_t *p, server_rec *s);
> 
> and one for public consumption:
> 
>   AP_DECLARE(void) ap_get_mon_snap(ap_mon_snap_t *ms);
> 
> - mod_systemd and mod_status now call ap_get_mon_snap(&snap) to retrieve the
> latest averaged data. mod_status still uses the scoreboard in addition to
> retrieve up-to-date current scalar metrics. Small adjustments to the
> mod_status output plus additions to mod_systemd notification messages. Tne
> STATUS in the notification of mod_systemd now looks liek this:
> 
> STATUS=Pct Idle workers 28.4; Pct Busy workers 10.6; Pct Dead workers 60.9;
> Requests/sec: 5030; Bytes served/sec: 2.9MB/sec; Bytes served/request: 596
> B/req; Avg Duration(ms): 5.78; Avg Concurrency: 29.1; Cpu Pct: 40.5
> 
> - scoreboard.c changes:
> 
>   - take over ap_get_sload(ap_sload_t *sl) from server/util.c.
>     Added copied code from mod_status to that function to also add up the
> request duration and the usr and sys CPU times.
> 
>   - ap_scoreboard_monitor() runs in the monitor hook. It calls
> ap_get_sload() and then a static utility function calc_mon_data() to
> calculate the new averages.
> 
> - Minor mmn change not yet part of the patch.
> 
> It compiles fine (maintainer mode) on RHEL 7 x86_64 and on Solaris 10 Sparc
> and I did some tests with mod_status and mod_systemd.
> 
> Regards,
> 
> Rainer
> 
> Am 24.04.2020 um 18:32 schrieb Rainer Jung:
> > Am 24.04.2020 um 16:21 schrieb Joe Orton:
> > > On Fri, Apr 24, 2020 at 12:17:19PM +0200, Rainer Jung wrote:
> > > > Thinking further: I think it would make sense to have a module or core
> > > > implement the monitor hook to generate that derived data (requests/sec,
> > > > bytes/sec, durationMs/request, avgConcurrency) in the last
> > > > monitor interval
> > > > and to provide that data to consumers like mod_systemd or - new
> > > > - mod_status
> > > > - instead of the long term averages since start. It could
> > > > probably be added
> > > > to the code that already provides "sload". That way mod_status
> > > > would also
> > > > profit from the more precise average values (taken over the last monitor
> > > > interval).
> > > 
> > > I definitely like the patch, it has bothered me that the "per second"
> > > stats are not very useful but wasn't sure how to make it better.
> > > 
> > > This is also an interesting idea.
> > > 
> > > So you would suggest having a new monitor hook which runs REALLY_LAST in
> > > the order, calls ap_get_sload() and stores it in a global, and then we'd
> > > have an ap_get_cached_sload() (or whatever) which gives you the cached
> > > data from the last iteration?  Or are you thinking of a more
> > > sophisticated API which does the "diff" between intervals internally?
> > 
> > Thanks for the positive feedback.
> > 
> > The averaged metrics IMHO only make sense as cached data, updated in
> > regular intervals and provided for use by various modules (probably only
> > mod_systemd and mod_status).
> > 
> > I would like to provide the already averaged data in a struct, each
> > metric as a float or double. The bytes/request probably not already
> > human readably scaled, because it makes its use less flexible. Since we
> > already also have the absolute counters at that point, we can easily add
> > them to the same struct as 32 or 64 bit counters and return a consistent
> > set of data (five old values, five new values, five averages and two
> > time stamps). [idle(32), busy(32), requests(64), bytes(64),
> > duration(64); req/s, bytes/s, bytes/req, dur/s, dur/req]. So consumers
> > needing a consistent view can get it.
> > 
> > Even more so since the absolute metrics are currently not cheap to
> > access. We collect all of them by iterating over the scoreboard and
> > summing up. By adding them to the cached data, the consuming code could
> > decide, whether such near-time data is good enough or it needs to
> > acquire new and curent counters. For mod_systemd, cached data (10 second
> > interval) might be OK.
> > 
> > For some modules - like mod_status - cached averages are fine, but I
> > think the counters should be correct for the point in time the status
> > request was handled by the module. So the scoreboard statistics code in
> > mod_status unfortunately would not go away.´, but the data quality for
> > the averages would become better.
> > 
> > Implementation wise I am thinking about adding
> > 
> >    ap_hook_monitor(mon_avg_monitor, NULL, NULL, ???);
> > 
> > to server/util.c, which calculates the new averages and
> > 
> >    ap_get_mon_avg(ap_mon_avg_t *ma)
> > 
> > which returns the four averages in a struct similar to the existing
> > ap_get_loadavg() and ap_get_sload().
> > 
> > We might have a little hassle to make the statistics update
> > atomic/thread-safe (eg. two instances of the internal data struct, so
> > that we only need to make the switch between them after the new
> > calculation atomic).
> > 
> > About REALLY_LAST: why last? If other modules collect data via this API
> > and wasn't to do it in the monitor hook as well, shouldn't run the
> > caching of data REALLY_FIRST, so you get the new averages?
> > 
> > I'll try to draft and test something along these lines later today. Fun
> > stuff. And more comments are very welcome.
> > 
> > And I like, that mod_status will profit by showing betrer averages as well.
> > 
> > Regards,
> > 
> > Rainer
> > 
> > > > Am 23.04.2020 um 21:29 schrieb Rainer Jung:
> > > > > Hi all,
> > > > > 
> > > > > triggered by the new mod_systemd I drafted a patch to enhance the
> > > > > monitoring data it provides during the monitor hook run.
> > > > > 
> > > > > Currently it publishes important data, like idle and busy slots and
> > > > > total request count, but also not so useful info like requests/second
> > > > > and bytes/second as a long term average (since start). These two 
> > > > > figues
> > > > > tend to become near constant after a longer time of operation.
> > > > > 
> > > > > Since the monitor hook of the module always seems to run in the same
> > > > > (parent) process, it is easy to remember the previous request and byte
> > > > > count data and average only over the last monitor hook interval. This
> > > > > should give more meaningful data. And is a change local to 
> > > > > mod_systemd.
> > > > > 
> > > > > In addition we have a third metric available in the scoreboard, namely
> > > > > the total request duration. From that we can get the average request
> > > > > duration and the average request concurrency. This part also needs a
> > > > > change to the sload structure. Maybe we need a minor MMN bump for 
> > > > > that.
> > > > > 
> > > > > I scetched a patch under
> > > > > 
> > > > > home.apache.org/~rjung/patches/httpd-trunk-mod_systemd-interval-stats.patch
> > > > > 
> > > > > 
> > > > > Any comments, likes or dislikes?
> > > > > 
> > > > > Thanks and regards,
> > > > > 
> > > > > Rainer
>

Re: Providing near-time averaged monitoring data for mod_systemd and mod_status

Reply via email to