Re: implementing a set of queue-processing servers

Rocco Caputo Fri, 15 Nov 2002 11:10:44 -0800

On Fri, Nov 15, 2002 at 11:45:33AM -0500, Stephen Adkins wrote:
> Hi,
> 
> I have the following requirement, and I am seeking your input.
> 
>  1. web-based users make requests for data which are put
>     in a queue
>  2. these requests and their status need to be in a database
>     so that users can watch the status of the queue and their
>     requests in the queue
>  3. a set of servers process the requests for data from the
>     queue and put the results in a results table so that
>     users can view their data when their requests are done
> 
> QUESTIONS:
> 
>  * What queue mechanism would you use, assuming all of the
>    writers and readers are on the same system?
>    (IPC::Msg? MsgQ?)


If speed is a major factor, I would use a FIFO (named pipe).  This is
a very lightweight and fast way to pass data between processes on the
same machine.

>  * How about if the queue writers were distributed, but the
>    queue readers were all on one machine? (RPC to insert into
>    the above-mentioned local queues?)
>  * How about if the queue writers and queue readers were all
>    distributed around the network? (Spread::Queue::FIFO?
>    Parallel::PVM? Parallel::MPI? MQSeries::Queue?)

Your requirement #2 seems to indicate that the queue is held in a
database table.  In that case the queue is inherently distributable.
Each machine makes its own connections to the database and processes
tasks in the queue using whatever locking is necessary.

This requires queue workers to poll the database for new jobs, which
you later state is something you're trying to avoid.

>  * What Perl server-building software on CPAN do you recommend,
>    and why? (Net::Server? Net::Daemon? POE?)
>    I started working with Net::Server, but it seems focused
>    on being a network request multi-server, not a queue working
>    multi-server.

I recommend POE, which is a generic near-real-time system with a
special focus on networking.

It is extremely flexible.  Designers can use the relatively low-level
select-like functions to roll their own networking toolkits, or they
can use the high-level classes available on the CPAN.  Homegrown I/O
abstractions will work with canned components because they share the
same basic events and I/O library underneath.

It's also very portable.  Servers have been tested on FreeBSD,
Windows, Solaris, Linux, OS/2, Mac OS (Classic and X), HP-UX and
Tru64.

Back-end POE components will work with a variety of event systems,
from simple select() and poll() loops to graphical toolkits like Tk,
Gtk.  It's possible to mix and match front ends (graphical, text, web,
etc.) with back-end components.  A single back end can have multiple
front ends, as long as they're compatible.  Some examples:

- a telnet chat server with a web front end
- a web crawler client with a command line console
- an IRC bot with a web control panel
- an IRC client with a Tk user interface

I could go on for days about how cool this stuff is. :)

You will need to pass messages between clients and servers.
http://poe.perl.org/?POE_Cookbook/Application_Servers shows one way to
do that.  It uses POE::Filter::Reference to transparently serialize
and reconstitute Perl data structures across a socket.  That class may
also be used in stand-alone programs, if you'd rather not base the
clients on POE's task manager.

>  * Would you implement it as many peer-level servers waiting on
>    a single queue? or a single parent server waiting on the queue,
>    dispatching queued work units to waiting child servers?

I recommend one or just a few peer-level servers later on, but as
you'll see I'm stuck on using the database as a persistent queue.

The FIFO case cited above uses a single parent server and a pool of
database processes managed by POE::Component::LaDBI.  FIFOs are very
cheap: PHP pages can write to them as if they were files, and the
database insert server can read from it with POE::Wheel::FollowTail.

A previous version of that program used INET sockets.  The overhead of
connect, accept, and disconnecting both ends of the sockets reduced
the server's performance by a factor of 4.  In other words, the author
saw ~100 transactions a second instead of ~400 with a FIFO.

While untested, I suspect that moving the clients off-machine will add
another 50 or so transactions per second.

> QUICK AND DIRTY SINGLE-SERVER SOLUTION
> 
> I implemented a quick-and-dirty single-server solution, where
> I use a single server to process requests.  I simply poll the
> request table in the database once a minute for new requests,
> and if they exist, I process them.
> 
> Now I am looking to upgrade this for higher throughput (multiple
> parallel servers), lower background load (no polling during quiet
> periods), and lower latency (immediate response to queue insertion
> rather than waiting for the next poll interval).
> 
> MY HUNCHES
> 
> I think I'll use IPC::Msg as the queue because the queue readers
> will all be on one machine.  I'll also have to implement a simple RPC
> server (using Net::Server) to perform remote insertions into the 
> local queue.  If this seems too rough, I'll probably install the
> Spread Toolkit and use Spread::Queue.
> 
> I currently think I'll keep working with Net::Server to see if I
> can use it to process a queue rather than listen on a network port,
> but I'm not sure that this is the right use of the module.
> I may end up ditching this effort and just have a set of parallel
> servers all waiting on the queue.  The queue mechanism itself will
> work out who gets to work on which request.
> 
> Any input?

Depending on how critical your transactions are, it may be more
reliable to use the database as the queue.  Jobs passed through it are
saved to persistent storage, making them more likely to survive a
crash.  Do you need to roll forward unprocessed tasks if you must
restart the server?

If you use the database as the queue, the message passing between
clients and servers amounts to little more than a wake-up call: Hey,
you've got task!

I briefly considered broadcasting UDP packets to wake up one or more
servers, but they would all then poll the database at once.  That's a
lot of overhead if there are fewer tasks than servers to handle them.

I then considered one or a few parent servers, each with multiple
worker processes behind them.  This is the model that the cited FIFO
server uses.  The wake-up message would be handled by the parent
process, which would in turn farm out the task to one of its children.
That prevents a large bolus of database hits every time a task is
enqueued.

Good luck.

-- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/

Re: implementing a set of queue-processing servers

Reply via email to