Stephen Adkins writes:
> The server(s) connect to a mainframe and perform time-consuming,
> repetitive transactions to collect the data that has been requested.
> Thus, these servers are slow, waiting several seconds for each
> response, but they do not put a large load on the local processor.
> So I want many of them running in parallel.

Makes sense.

> Are you proposing that I use Apache/mod_perl child processes to do
> the transactions to the mainframe?  That doesn't seem right.
> They are then no available to listen for HTTP requests, which is 
> the whole purpose of an apache child process.

That's the point.  By using the Apache/mod_perl processes for all
work, you can easily designed for peak load.  It's all work.  You can
serve HTTP requests if your machine is overloaded doing "work" of
other sorts.

We do this for e-commerce, web scraping, mail handling, etc.
Everything goes through Apache/mod_perl.  No fuss, no muss.

> You seem to advocate Apache/mod_perl for end-user (returning HTML)
> and server-to-server (RPC) use.  That makes sense.
> But it doesn't seem to make sense for my family of servers that
> spend all of their time waiting for the mainframe to return their
> next transaction.

Can you do asynchronous I/O?  You'll be a lot more efficient memory
and CPU-wise if you send a series of messages and wait for the results
to come in.  Consuming a Unix/Mainframe process slot (or even a
thread) for something like this is very inefficient.

I worked on a CORBA-based Web server for Tandem, which didn't use
threads.  Instead the servers would do asynchronous I/O to the
resources they were responsible for.  I built the CGI component, which
on Tandem was a gateway to Tandem's transaction monitor, Pathway.
All CGI processes were managed by a single process which accepted
requests via CORBA and fired off messages to Pathway.  When Pathway
would respond, the CORBA response would be sent.  Replace CORBA with
HTTP, and you have a simpler, more efficient solution.

One other trick you might try is simply hanging onto the HTTP request
until all the jobs for a particular user finish.  If you have, say 50
jobs, and they run in parallel, they might get done before 30 seconds
which is short enough for a person to way and that way you don't deal
with the whole database/polling/garbage collection piece.

Rob


Reply via email to