Graham Leggett wrote:

On a number of occasions recently I have run into the need to run some kind of garbage collection within httpd, either in a dedicated process, or a dedicated thread.

  I've also written a few modules where each child process runs
a private thread in the background.  I'd suggest there are perhaps
three variants here (in a Unix-like context, anyway): one separate
process, one thread in the master process, or one thread per child
process.  Presumably MPMs should somehow indicate which of these they
support.

ap_cron_per_second
ap_cron_per_minute
ap_cron_per_hour
ap_cron_per_day
ap_cron_per_week

  I wonder if you could flatten these down to a single
ap_monitor_interval() kind of thing, where the module specified
the interval it wanted?

  I suppose one could go the other direction toward a full-blown
scheduler, but that seems like a lot of extra effort for perhaps
little gain.

  It might be nice to also offer the option to randomly stagger the
creation of processes/threads within some additional time interval --
especially for one-per-child threads, that could help avoid a
"thundering herd" kind of situation where a bunch of child processes
start up together (e.g., at a restart), and later all kick off
resource-intensive threads at nearly the same time.  Of course a
module's background threads could wait some random interval on their own
after being started, but then they're eating up time sleeping while the
invoking process thinks they're doing work.


  My other thought is that it would be really nice to be able to
track the status of these processes and threads in mod_status; e.g.,
in the scoreboard or a scoreboard-like utility.  Moreover, it would
be excellent if the scheduler/MPM could use this info to avoid spawning
new processes/threads if the old ones were still executing ... for a
per-second kind of interval, that might be quite important, especially
if there's any chance the task could occasionally get "stuck".

  I feel this relates a bit to my continued interest (mod lack of time)
in abstracting the scoreboard and shared-memory subsystems into a
"shared map" facility.  Joe Orton did a bunch of work on the SSL
session cache which I think moves it in this direction; see his
RFC and a couple of my responses:

http://marc.info/?l=apache-httpd-dev&m=120397759902722&w=2
http://marc.info/?l=apache-httpd-dev&m=120406346306713&w=2
http://marc.info/?l=apache-httpd-dev&m=120491055413781&w=2

  I also put a larger outline of some of my notions in this regard into
this document:

http://svn.apache.org/viewvc/httpd/sandbox/amsterdam/architecture/scoreboard.txt?view=markup

  In particular, in thinking about background processes and threads
both of the type you're proposing and those created by modules like
mod_cgid and mod_fcgid, I had began tossing around the notion of modules
being able to register additional scoreboard states beyond those
hard-coded now into mod_status:

 - during pre/check/post-config phases, modules (including MPMs)
   may indicate if they need IPC space, what type, and how much:

   - private space in scoreboard table values
   - additional scoreboard states

 - at startup, the master process initializes the storage provider:

   - MPM sizes scoreboard based on runtime process and thread limits,
     not compile-time maximums
   - assigns IDs for additional scoreboard states requested by modules
   - creates scoreboard state-to-ID hash mappings in regular memory
     as part of read-only configuration data inherited by children

  We could then offer modules standard ways to ask for processes or
threads to be spawned at startup/restart time, or on some schedule
(as per your initial proposal), and for these processes/threads to
update their status record in the scoreboard.  Certain MPMs (e.g.,
worker, event) also spawn threads that aren't recorded in the scoreboard
at the moment; it would be great to see them in the scoreboard too.

  The administrator could then see everything at a glance in
mod_status, including all the background tasks, and the scheduler/MPM
would have a standard way to check if a background task was still
running from a previous invocation.

  I feel like there's a certain serendipity in this proposal coming
along around the same time as Joe Orton's work, which seems to me to
be heading toward not so much of a cache in the traditional httpd sense
(i.e., mod_cache and friends) as a generic "shared map" interface that
would be useful in a wide variety of ways, including implementing a
configurable scoreboard that could help track arbitrary background tasks.

  Thoughts, flames?  Fire away!  Thanks,

Chris.

--
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Reply via email to