On 09/06/2014, at 1:50 PM, Jason Garber <[email protected]> wrote:
> Can you comment on how this mechanism could provide data to a custom 
> monitoring application and the plans for extending it to cover daemon mode 
> process  information?
> 
What this feature relies on is an ability within mod_wsgi itself to provide a 
snapshot of what is called the Apache scoreboard.

This Apache scoreboard is a shared memory segment used by Apache to keep track 
of the status of each worker thread across all Apache processes. Apache itself 
uses this information to determine how busy the Apache child processes are that 
the worker threads run in and depending on the MPM settings will use the 
information to increase or decrease the number of child process running.

If mod_status is loaded, additional information about the number of requests 
handled by Apache is also kept within the scoreboard. Enable ExtendedStatus and 
even more information is tracked.

As mod_status itself provides by way of being able to expose an external URL 
such as /server-status, the plugin uses the data to create a picture of what 
Apache is doing.

The difference between what the plugin is doing and what can be done from an 
external monitoring system using the exposed URL, is that the plugin can poll 
on a regular 1 second interval without needing to do a web request which would 
itself be reflected in server traffic. By being able to poll more frequently it 
can build up a better picture, including using extra detail available directly 
from the scoreboard to grab sample data on actual requests and so generate 
response time averages and percentiles.

The short polling also allows the number of active child processes to be 
monitored closely and so be able to derive metrics for things such as process 
churn for the child processes, as well as more accurate metrics about server 
restarts and capacity utilisation.

The intention in monitoring mod_wsgi is for mod_wsgi to have its own form of 
scoreboard using shared memory which tracks a snapshot of what is going on 
across all processes, whether the WSGI application is running in embedded mode 
or daemon mode.

This would allow tracking of details such as response time measured within the 
WSGI application independent of front end time, the queueing time which is how 
long between when Apache accepted the request and the WSGI application got to 
handle it, plus separate measures of capacity utilisation for each mod_wsgi 
daemon process group. Other metrics which could be thrown into this might 
include queue depth for daemon processes, queue timeout rate, daemon connection 
failure rates etc.

If a module such as psutil were available, then possibly the plugin could also 
track and report for processes memory usage, CPU usage and the number of 
process context switches. In essence, anything I can think of that would help 
to supplement data on throughput and response times to work out whether 
changing processes/threads mix is actually having some form of positive effect.

So my plans for future work in trying to achieve all that are as follows.

1. Refactor the current plugin code, which doesn't have a clean separation 
between deriving the metrics and reporting them up to New Relic, such that 
there is a distinct layer between the two.

2. Implement the equivalent of a scoreboard for mod_wsgi itself in order to be 
able to accumulate the additional information required and enhance the metrics 
generation code and the current New Relic plugin to match.

3. Create an optionally enabled internal consumer of the metrics which would 
retain a working history of metrics for a period of 30 minutes, but only within 
the collecting process itself, this process being a dedicated daemon process 
group set up to collect the data. Part of this would involve a minimal REST API 
to retrieve raw metric data from the process in some way.

4. Create as a proof of concept an extension for Django Debug Toolbar which can 
query the historical data from the in memory cache using the REST API. I intend 
doing this purely though as an example to support a talk I will be giving at 
PyCon AU in August on how Python web application toolbars work. Part of the 
talk will be about the usefulness or applicability of debug toolbars to a 
production environment, and I can see this proof of concept helping me to 
illustrate some points about the problem of a debug toolbar being of use in a 
multi host deployment.

That is as much as I have planned at this point.

Things I have no intention of doing are the following.

1. Creating any plugin to report data to any other charting system such as 
Graphite.

2. Creating a database for long term persistence of data.

3. Creating any chart visualisation system of my own to view the metric data, 
beyond any minimal experiments I may do to support the Django Debug Toolbar 
experiment.

The reason I am not doing any of these is that they are outside of my area of 
expertise. I have never used tools such as Graphite. I am not a database 
person, nor am I a front end web developer or Javascript developer.

I well know from my work at New Relic how much time and effort needs to go into 
creating a professional production quality backed system for retaining and 
visualising metric data and even if I had the skills in those areas it would be 
an amazingly huge time suck which would totally dwarf any time I am even able 
to spend on progressing mod_wsgi itself.

Since my experience lies in the area of Apache, WSGI servers and instrumenting 
for and collecting metric data, I will keep to that area. Doing so is just the 
most practical thing I can do as that is where I will be most productive and 
can do the most.

Those areas are also the ones I am interested in and enjoy working in. I don't 
find database and front end web design to be that interesting and given that my 
impetus for doing any work these days is a personal requirement or because I 
enjoy the technical challenge in a specific problem, then I will as a result be 
staying well clear of those areas.

Personally I got no issue if others want to pursue those things I have no 
interest in and certainly the way I intend refactoring the code would allow 
anyone to develop their own plugins to get the metrics out and into some other 
system.

In saying that, please don't take this as me saying 'patches welcome' and 
otherwise buzz off. I hate the way that some Open Source projects will say that 
when they don't have time to do something themselves. Reality is that I am time 
poor and I simply need to focus my time in the best way I can.

If you are genuinely interested in trying to fill in those areas where I feel I 
can't do a good job or don't have the time, I will not stop you nor make it 
difficult and will actually be accommodating as I can and make it easier for 
you to get the data out and also advise on what would be the best way to do 
something.

What I simply am not in a position to do is lead such an initiative. My own 
priorities and interest will always take precedence and I have come to learn 
that I must do that if I am to avoid become burnt out again in respect of the 
work I do on mod_wsgi. So is due to a measure of self preservation that I take 
this stance.

Hope that all makes sense and gives you a better idea of where I am heading and 
why I am restricting myself to that.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to