On Sun, 2005-12-18 at 23:48 -0600, JT Smith wrote:
> I added in event handling with both Event 
> and Event::Lib as seperate trials.

I just used a short sleep with Time::HiRes between polling the database
for new jobs.

> If everything else is running in Apache, why start a 
> seperate service to run these tasks?

Because everything else sounds much harder and less reliable.  That was
my reason.

>  And again, I said I want to go crazy. Let's not 
> figure out how else we could do that (I already know that), but how
> could we do it using 
> Apache?

I came up with about a half dozen possible ways of doing our queue
system.  Only one of them used apache as the only daemon.  I looked into
custom protocol handlers and the rest of the mod_perl 2 API and there's
nothing I can see that would make a time-based system possible.  It
would require rewriting some C code and probably changing things that
are not part of the module API.

The idea I had for handling events without all that goes like this:

Run a mod_perl server that the job submitters contact via HTTP. When a
process gets a request, it checks to see if there are enough listener
processes free for accepting jobs (as opposed to processing jobs).  If
there are not, it adds the job to the queue and goes back to listening
for requests.  If there are, it processes the job.  This ensures that
processing jobs does not starve the ability to accept new ones.

A process which has started working on a job will loop (keeping the
current request alive), pulling jobs off the queue and working on them
until the queue is empty again, when it will allow the request to finish
and go back to sleep.  In other words, new requests will start child
processes working, and all processes that get started will stay working
until the queue is empty again.

Pros 
      * No polling except by processes that have just finished a job and
        are deciding whether or not to exit.  (This may still be more
        than the polling done by a simple perl daemon.)
      * Quick pickup of new jobs.
Cons 
      * Clustering would require some kind of custom load balancer that
        would know which machines were actually busiest.  This might
        involve reading the scoreboard, or something more complex.  Much
        harder than other approaches.
      * No obvious way to tell how many processes are working vs.
        listening.  Would probably need to use something like
        Cache::FastMmap to track this.
      * The whole idea is fairly hard to explain, which probably means
        it's too complex and will be hard to build and debug.

Anyway, feel free to expand on this idea or try it out.  As complex as
it is, it avoids having to delve into the guts of the httpd code.

- Perrin

Reply via email to