Re: asynchronous execution, was Re: implementing a set of queue-processing servers

Perrin Harkins Tue, 26 Nov 2002 22:43:53 -0800

Bas A.Schulte wrote:
> Quite odd. I read the performance thread that's on the P5EE page which
> showed that DBI (with MySQL underneath) was very fast, came in 2nd.
> Anyone care to elaborate why this is? After all, shared-memory is a
> thing in RAM, why isn't that faster?

I have an article that I'm working which explains all of this, but the short explanation is that they work by serialzing the entire memory structure with Storable and stuffing it into a shared memory segment, and even reading it requires loading and de-serializing the whole thing. IPC::MM and the file-based ones are much more granular. Also, file systems are very fast on modern OSes because of efficient VM systems that buffer files in memory.

> I'm not saying I want entity beans here ;) It's just that I've been
> doing perl to pay for bills and stuff the past few years and see a lot
> of people having some (possibly perceived?) need for something missing
> in perl.

It may be that they just want someone to tell them how they should do
things. J2EE does provide that to a certain degree.

> If I read your mail, you mention some solutions/directions for some
> problems I'm dealing with, but that's just my issue (I think; it's just
> coming to me): we have a lot of "raw metal" but we do have to do a lot
> of welding and fitting before we can solve our business problems.
>
> That is basically the point.

I don't think it's nearly that bad. After my eToys article got published, I got several e-mails from people saying something like "we want to do this, but our boss says we have to buy something because of all the INFRASTRUCTURE code we would have to write."

Infrastructure? What infrastructure? The only stuff we wrote that was really independent of our application logic were things like a logging class and a singleton class, which can now be had on CPAN. We wrote our own cache system, but that's because it worked in a very specific way that the available tools didn't handle. I think I could do that with CPAN stuff now too.

> To illustrate that, I'll try to give a real-world example

Thanks, it's much easier to talk about specific situations.

> To deliver these messages, I send them off to another server (using my
> own invented pseudo-RMI to call a method on that server).

I would use HTTP for that, because I'm too lazy to write the RMI code myself.

> 1. The server that does the delivery has plenty of threads (errrrr, a
> Apache/mod_perl child) so I hope I have enough of them to deliver the
> messages at the rate the backend server generates them: one child might
> take up to 5 seconds to deliver the message but there are plenty childs.
>
> Not good. I've seen how this works and miserably fails when a delivery
> mechanism barfs.

If they were so quick to process that you could do it that way, I would
have just handled them in the original mod_perl server with a
cleanup_handler. Obviously they are not, so that's not an option here.

> 2. Same as 1 but I never allow one delivery mechanism to use all my
> Apache/mod_perl children by adding some form of IPC (darned, need to
> solve my data sharing issues first!)

I think they are already solved if you look at the modules I suggested.

> so the children check what the
> others are currently doing: if a request comes in for a particular
> delivery mechanism, I check if we're already doing N delivery attempts
> and drop the request somewhere (database/file, whatever) if not. I have
> a daemon running that monitors that queue.

I would structure it like this:
- Original server takes request, and writes it to a database table that
holds the queue.
- A cron job checks the queue for messages, reads the status from
MLDBM::Sync to see if we have free processes, and passes the request to
mod_perl if we do. (Not that this could also be done with something
like PersistentPerl instead.) If there are no free processes, they are
left on the queue.

> That daemon gets complicated quickly as it also has to throttle delivery
> attempts

My approach only puts that logic in the cron job.

> I need some form of persistent storage (with locking)

The relational database. Or MLDBM::Sync if you prefer.

> what do
> I do when the delivery mechanism has failed for 6 hours and I have 12000
> messages in the queue *and* make sure current messages get sent in time?

I don't know, that's an application-specific choice. Of course JMS
doesn't know either.

> 3. I install qmail on the various servers, and use that to push messages
> around. This'll take me a week or so (hopefully) to get it running
> reliably in production

One of the major selling points for qmail is easier setup. You could
use pretty much any mail server though if you have more experience with
something else. I just like qmail because it's fast.

> Later on, I
> realise that for each messages, a fullblown process is forked *per
> message*: load up perl, compile perl code etc..

I described how to avoid this in another message: use PersistentPerl or
equivalent, or pass things off to mod_perl. Note that both of these
include a way to limit concurrency.

> 1. I write a 6-liner message bean.

More than that I think, plus a bunch of configuration files.

> I can configure a lot of things in the JMS
> provider like the max. number of concurrent invocations of the message
> bean so I don't overload my machine.

But can it do the sort of throttling you wanted, where it will impose limits based on protocol or on destination? If not, you're back to doing it yourself.

> I could probably configure an
> expiry attribute to messages so messages that have become too old are
> dropped by the JMS provider.

I doubt it. You probably have to code that yourself.

I think message beans do sound attractive in this situation, and there is some infrastructure code to write here on the Perl side (putting messages in the database, reading them back out of it, triggering the appropriate handling code, and maybe throttling if you can find a way to make it generic enough). It wouldn't be hard to make some modules that do what I described with cron, an RDBMS, and mod_perl, and that might be a good addition to CPAN. Or maybe a generalization of Jason's Spread::Queue is in order. People do seem to bring up messaging/queues a lot on this list.

> Well, this is just one example, I was going to give 2 more, one on
> data-sharing between "threads" (Apache/mod_perl child in perl, thread in
> java)

I think I've got that covered in my article. It's easy to share data if you use a good module like MLDBM::Sync or IPC::MM.

> and one on remote method invocation (remote servers runs
> instances, I call methods on them from client machine)

I still think HTTP is the best way to go unless your problem totally doesn't fit it.

> BTW, object persistency would also be an interesting topic
> (Alzabo/Tangram/SPOPS/Class::DBI/hand-coded etc.)

That one has been done quite a bit in several forums. The next step is for someone to actually code a sample app with reall OO stuff (inheritance, polymorphism) with several of these and report on performance and ease of use. That's a lot of work, so no one has done it yet.

- Perrin

Re: asynchronous execution, was Re: implementing a set of queue-processing servers

Reply via email to