Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-27 Thread Rob Nagler
Perrin Harkins writes:
 Bas A.Schulte wrote:
   I do when the delivery mechanism has failed for 6 hours and I have 12000
   messages in the queue *and* make sure current messages get sent in time?
 
 I don't know, that's an application-specific choice.  Of course JMS
 doesn't know either.

This is one of the endemic problems with J2EE.  It doesn't know, and
it has to offer you lots of options to allow you to control the
horizontal and vertical.  Since it is a distributed platform, it can't
export hooks (callbacks), which allow you to decide on the fly.  The
options get out of control, and make it look like the system is
fancier than it really is.  Rather, when you see an option, it usually
means the developers couldn't agree on what to do (paraphrased from
Joel Spolsky, http://www.joelonsoftware.com/).

With bOP, we tend to make policy decisions like this centrally, e.g.,
no exactly-once semantics.  There's a real cost, but then we've used
bOP for a wide variety of batch and Web applications without much
strain so we keep doing it this way.  When we stress the system too
much, we add a decision point (option) for the programmer.  However,
we only do this after careful deliberation.  This is one of the
reasons we don't release bOP in parts as some have suggested.  You
can use it in layers, but every application we've built ends up using
all the layers.

J2EE has too many competing/conflicting components, and each of those
components can be configured in myriad ways.  Only experienced
building distributed systems builders can know the trade-offs.  J2EE
is sold as an everyman's platform for everybody's problem.  This means
people often get caught using the wrong tool (entity beans) the wrong
way (a bean per DB row).  There's no easy answer to the problem of
distributed systems (esp. one as complex as SMS message queueing), and
J2EE gives one the impression there is, all in imiho, of course. :-)

BTW, the issue of exactly-once vs at-most-once is a tough one (and was
subject to much debate in the 80s).  JMS tries to guarantee
exactly-once, but that's really hard to do.  Especially in an SMS
situation where network partitioning is a real problem.  My
alphanumeric pager service holds messages for 3 days, and that's a
long time imo.  They can only do this, because pagers aren't
bi-directional (for the most part).  Once you get into SMS space,
where devices are bi-directional and much more useful, you have a real
problem promising exactly-once semantics.

Rob





Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Bas A . Schulte
Hi all,

On Tuesday, November 19, 2002, at 11:09 PM, Perrin Harkins wrote:


Stephen Adkins wrote:


So what I think you are saying for option 2 is:

   * Apache children (web server processes with mod_perl) have two
 personalities:
   - user request processors
   - back-end work processors
   * When a user submits work to the queue, the child is acting in a
 user request role and it returns the response quickly.
   * After detaching from the user, however, it checks to see if fewer
 than four children are processing the queue and if so, it logs 
into
 the mainframe and starts processing the queue.
   * When it finishes the request, it continues to work the queue until
 no more work is available, at which time, it quits its back-end
 processor personality and returns to wait for another HTTP 
request.

This just seems a bit odd (and unnecessarily complex).


It does when you put it like that, but it doesn't have to be that way.


I've implemented the exact thing Perrin describes in our SMS game 
platform (read a bit about it here: 
http://perl.apache.org/outstanding/success_stories/sms_server.html).

When synchronous requests come in that trigger some event that has to 
take place in the future *and* that runs in the same Apache server 
instance, I have an external (simple) daemon that reads timer events 
from a shared database table and posts HTTP requests to the Apache 
server instance.

The reason I did it like this is that I can easily (not to mention 
quickly) run perl code in Apache *and* it is quite a stable server, much 
more stable than something I could whip out in perl. I did try some perl 
preforking server code (from Lincoln D. Stein's book and 
Net::Server::PreFork as well as some self-programmed stuff) but none of 
them seemed to be stable/fast under heavy load even though I would have 
preferred that as it would allow me to do something to handle 
data-sharing between children via the parent which always seems to be in 
issue in Apache/mod_perl.

The only thing that now and then is problematic is that Apache child 
processes in which my perl code runs are not easily coordinated (at 
least I still haven't found a good way). So this situation (from 
Stephen's mail):

We have a fixed number of mainframe login id's, so we can only run a
limited number (say 4) of them at a time.


still is something I haven't figured out. Basically, I need some way to 
coordinate the children so each child can find out what the other 
children are doing.

BTW: I've been reading up a lot on J2EE lately and it appears more and 
more that a J2EE app server could quite nicely provide for my needs 
(despite all shortcomings and issues of course). Now if there only was a 
P5EE app server ;)

Regards,

Bas.




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Rob Nagler
Bas A.Schulte writes:
 still is something I haven't figured out. Basically, I need some way to 
 coordinate the children so each child can find out what the other 
 children are doing.

Use a table in your database.  The DB needs to support row level
locking (we use Oracle).   Here's an example:

insert into resource_lock_t (instance_count) values (1)

Don't commit yet.  Rather, right before committing, delete the row:

delete from resource_lock_t where instance_count = 1

Anybody waiting for instance_count #1 will block until the delete
happens.  You only allow up to four inserts (instance_count is the
primary key).

Rob





Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Perrin Harkins
Bas A.Schulte wrote:

none of
them seemed to be stable/fast under heavy load even though I would have 
preferred that as it would allow me to do something to handle 
data-sharing between children via the parent which always seems to be in 
issue in Apache/mod_perl.

What are you trying to share?  In addition to Rob's suggestion of using 
a database table (usually the best for important data or clustered 
machines) there are other approaches like IPC::MM and MLDBM::Sync.

Basically, I need some way to 
coordinate the children so each child can find out what the other 
children are doing.

Either of the approaches I just mentioned would be fine for this.


BTW: I've been reading up a lot on J2EE lately and it appears more and 
more that a J2EE app server could quite nicely provide for my needs 
(despite all shortcomings and issues of course).

What is it that you think you'd be getting that you don't have now?

- Perrin




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Perrin Harkins
Nigel Hamilton wrote:

	I need to fork a lot of processes per request ... the memory cost 
of forking an apache child is too high though.
	
	So I've written my own mini webserver in Perl

It doesn't seem like this would help much.  The thing that makes 
mod_perl processes big is Perl.  If you run the same code in both they 
should have a similar size.

- Perrin



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Bas A . Schulte
Hi Perrin,

On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote:


Bas A.Schulte wrote:

none of
them seemed to be stable/fast under heavy load even though I would 
have preferred that as it would allow me to do something to handle 
data-sharing between children via the parent which always seems to be 
in issue in Apache/mod_perl.

What are you trying to share?  In addition to Rob's suggestion of using 
a database table (usually the best for important data or clustered 
machines) there are other approaches like IPC::MM and MLDBM::Sync.

I don't want to use a database table for the sole purpose of sharing 
data, I mean, I run the Apache/mod_perl servers to handle different 
components of our system, some run on top of a database and some of them 
don't.

Also, the things I would want to share are fairly dynamic things so a 
roundtrip to a database would probably add quite some overhead.

I have been looking at some of the IPC::Share* modules, the one I think 
I can use is (not sure here) IPC::ShareLite, but that darned thing won't 
install on my dev. machine (iBook/OS X) so I've been postponing things a 
bit ;)

My current plan is IPC::MM, stay tuned.


As to *what* I'm trying to share: I don't really know yet ;) Dynamic 
stuff like:

- what is a given child doing (to do things like: ok, I'm currently 
pushing data to some client in 5 children, and I don't want to have 
another child do this now so stuff this task in a queue somehere so I 
can process it later);
- application state. This is domain-specific so it's a bit hard to 
explain what I mean. I need serialized and *fast* access to this info so 
I would prefer not having this in my database.

NB: I posted a question on the first issue (look for IPC suggestions 
sought/talking between children? somewhere in the mod_perl mailinglist, 
I never seem to recall the proper archive site for it), didn't get any 
feedback on it as it probably goes beyond what someone would normally 
want from a web server.


BTW: I've been reading up a lot on J2EE lately and it appears more and 
more that a J2EE app server could quite nicely provide for my needs 
(despite all shortcomings and issues of course).

What is it that you think you'd be getting that you don't have now?


Again; I don't know exactly but when I read stuff about entity-, 
session- and message beans, JMS etc., it has a lot of resemblance with 
what I'm currently doing by hand i.e. implement functionality like 
that on top of a bare Apache/mod_perl server.

A good example would be JMS: you get this for free (with JBoss 
anyway ;)) in a J2EE app. server but there's no obvious choice for us 
perl guys. There are some options I see now and then: Spread/Stem/POE, 
but none of these choices are obvious in the sense that they are being 
used by a lot of people to solve the type of problems JMS solves so 
there's really no one to turn to for advise; again, I'm building stuff 
between the raw metal and my own stuff.

BTW: with the issue on data-sharing: the same thing: I have raw metal 
(Apache/mod_perl and IPC:MM) and need to implement an API on top of them 
before I have the needed functionality. Again I'm building stuff again 
before I can solve my actual business problems.

I think these issues point out that we are missing *something*, I know 
*I* am :)

Regards,

Bas.




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Stephen Adkins
At 07:04 PM 11/26/2002 +0100, Bas A.Schulte wrote:
On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote:
 Bas A.Schulte wrote:

I have been looking at some of the IPC::Share* modules, the one I think 
I can use is (not sure here) IPC::ShareLite, but that darned thing won't 
install on my dev. machine (iBook/OS X) so I've been postponing things a 
bit ;)

My current plan is IPC::MM, stay tuned.

Hi,

Take a look at

   http://www.officevision.com/pub/p5ee/components.html#shared_storage

There are references to every major shared storage method I have seen
discussed on the mod_perl list or elsewhere.

There are also some interesting links to mod_perl list discussions
on performance comparison and synchronization using the various
tools.

Stephen





Re: implementing a set of queue-processing servers

2002-11-26 Thread Rocco Caputo
On Mon, Nov 25, 2002 at 07:31:35PM -0700, Rob Nagler wrote:
 Matt Sergeant writes:
  There's a huge difference in what they are trying to achieve though. 
  POE doesn't open any files and it doesn't write any files to disk. None 
  of it is written in C (yet), so unless there's a buffer overrun or type 
  mismatch bug in perl you can exploit, you're not going to get in that 
  way.
 
 I agree that Perl is a safe language (independent of taint, which
 adds safety).  Unfortunately, there has been a history of insecure Perl
 programs (formail.pl, I think being the most famous).  This may be
 a consequence of bad programming, but you have to look at the
 average if you are selecting a system without reviewing every line of
 code, i.e., performing a security audit.

Rating all of CPAN according to the quality of the average module does
a disservice to its better half.  Depreciating its good distributions
also feeds into the myth that all Perl software is shoddy.

 I trust Linux more than Apache, for example, because Linux is not only
 older, but was built using an interface design which is 30 years old
 and has been allowed to evolve.

It seems naive to assume that an older project is more reliable than a
younger one.  Inception dates have no bearing on the age and quality
of source code, otherwise djbdns would be considered less reliable
than bind.

  I'm not honestly suggesting it's bug free, but I fail to see how a bug 
  in POE would give you access to the system.
 
 Use of a user string incorrectly in a system or open might do it.
 Also, an incorrect chown, chmod, umask, etc.

A casual grep through POE's source would reveal that it doesn't do any
of this.

You seem to be making claims against POE based on broad generalization
rather than research.  Regardless of your intent, representing these
opinions as facts does damage the project's reputation, since they are
available out of context and forever through the list's archives.

-- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Nigel Hamilton
 
 Quite odd. I read the performance thread that's on the P5EE page which 
 showed that DBI (with MySQL underneath) was very fast, came in 2nd. 
 Anyone care to elaborate why this is? After all, shared-memory is a 
 thing in RAM, why isn't that faster?
 

Hi Bas,

You made some really interesting points in your last email ... and 
I hope it sparks a full discussion. 

Just a quick point on the MySQL observation above ... MySQL
Memory-Hash Tables may be even quicker, again - as the disk is not
involved. Your messages could be inserted into a buffer table with a
microsecond timestamp and then a separate process(es) pops messages off
the queue. This hands the memory consumption problem to MySQL and provides
multiple ways of talking to the queue (cronjobs, apache kids etc).

At Turbo10, our click-through system choked under heavy load until
we implemented it as a memory buffer (MySQL hash table) ... just a 
thought.


Nigel


-- 
Nigel Hamilton
Turbo10 Metasearch Engine

email:  [EMAIL PROTECTED]
tel:+44 (0) 207 987 5460
fax:+44 (0) 207 987 5468

http://turbo10.com  Search Deeper. Browse Faster.




Re: implementing a set of queue-processing servers

2002-11-26 Thread Rob Nagler
Rocco Caputo writes:
 Rating all of CPAN according to the quality of the average module does
 a disservice to its better half.  Depreciating its good distributions
 also feeds into the myth that all Perl software is shoddy.

That isn't what I said.  I program Perl daily.  I use a bunch of CPAN
on a daily basis.  It's important to look at the average of all
software.  It's just like I would rather fly in an airplane 100 miles
than drive 100 miles.

 It seems naive to assume that an older project is more reliable than a
 younger one.  Inception dates have no bearing on the age and quality
 of source code, otherwise djbdns would be considered less reliable
 than bind.

Is old code is good code a myth then?  It's certainly bandied about
often enough.

  Use of a user string incorrectly in a system or open might do it.
  Also, an incorrect chown, chmod, umask, etc.
 
 A casual grep through POE's source would reveal that it doesn't do any
 of this.

I looked briefly at UserBase.pm, because it seems to have something to
do with security.  I came up with a few questions which weren't easily
resolved.  There are probably good answers to all my questions, but
I'm a fairly experienced programmer and my casual observations didn't
find them.  I wouldn't find easy answers for Apache either, but I
*trust* Apache from its reputation alone.  That's the best I can do,
and that's what I've been arguing about.

Anyway, here's a quick list:

  -d $heap-{Dir} || mkdir $heap-{Dir},0755;

Is $heap-{Dir} supposed to be readable by everybody?  What is
$head-{Dir}?  Will it contain data from the heap on disk?  What if
there's a clear text password in the heap?

  open FILE,$heap-{File} or
  croak qq($heap-{_type} could not open '$heap-{File}'.);

This contains a small error: there should always be a space after .

  unlink $heap-{Dir}/$href-{user_name} if $href-{new_user_name};

What if $heap-{Dir} is misconfigured and set to /var/mail? Is POE
running as root?

sub poco_userbase_update {
  my $heap = $_[HEAP];
  my $protocol = $heap-{Protocol};
  my %params   = splice @_,ARG0;

  for($heap-{Cipher}) {

$_ is set, and it isn't local($_).  This is a problem, because other
code gets values.  Always use lexically scoped variables.  Dynamically
scoped variables are a major source of unexpected behavior. Nit:
Barewords are bad imiho.  ARG0 and HEAP should be subroutines or
methods.

  my $stm = _EOSTM_;
delete from $heap-{Table}
where $heap-{UserColumn} = '$href-{user_name}'
_EOSTM_
$stm .= qq[ and $heap-{DomainColumn} = '$href-{domain}'] if
$href-{domain};

This is naive SQL.  What if the user_name or domain has a ' in it?
What if it contains arbitrary code such as:

dontcare' OR user_name like '%

Bad news.  Use '?' for all arguments except constants.  The result
isn't checked to see how many records were deleted either.

 You seem to be making claims against POE based on broad generalization
 rather than research.  Regardless of your intent, representing these
 opinions as facts does damage the project's reputation, since they are
 available out of context and forever through the list's archives.

I have no doubt POE is written well and certainly with the best
intentions.  Let that stand in the archives forever.

However, the debate was not about POE vs Apache, but essentially about
old code vs new code--with a side discussion about security through
obscurity.  Given two packages, which I'm not familiar with, I'll take
the older one over the new one any day.

Rob





Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Rob Nagler
Perrin Harkins writes:
 I think you are vastly over-estimating how much effort JMS/EJB/etc. 
 would save you.

EJB doesn't save you anything.  It creates work and complexity,
esp. Entity Beans.  I've built large systems using EJB and Perl.  The
Perl project was built faster, with fewer people, runs more reliably,
runs faster, and the Perl company is still in business, which is the
only point that really counts. :-)

JMS does solve an interesting problem, but don't use Message Beans,
use raw JMS.  Make sure JMS isn't looking for a solution, though.
Often times, the solution is better and more robustly solved by
implementing pending replies from the server.  This avoids a number
of resource management issues, which can really bog a server.

Rob






Re: implementing a set of queue-processing servers

2002-11-26 Thread Rocco Caputo
On Tue, Nov 26, 2002 at 04:26:13PM -0700, Rob Nagler wrote:
  Rob Nagler also wrote:
   I trust Linux more than Apache, for example, because Linux is
   not only older, but was built using an interface design which is
   30 years old and has been allowed to evolve.

 Rocco Caputo wrote:
  It seems naive to assume that an older project is more reliable than a
  younger one.  Inception dates have no bearing on the age and quality
  of source code, otherwise djbdns would be considered less reliable
  than bind.

Rob Nagler again:
 Is old code is good code a myth then?  It's certainly bandied about
 often enough.

First I'd like to apologize for reading more into your posts than you
intended.  Thanks for making things clear in your last message.

On average, older projects may tend to be more reliable than younger
ones, but old code is good code is not a hard rule.  It also applies
more to code than to projects like Linux and Apache as whole things.
The age of a project is no guarantee of the age of its code.

The assertion also assumes at least three things about code.  It
relies on all code being born at the same level of quality.  It
demands that all code progresses towards Quality Nirvana at a constant
rate.  It assumes that updates never make things worse than before.

 Rocco Caputo writes:
  Rob Nagler wrote:
   Use of a user string incorrectly in a system or open might do it.
   Also, an incorrect chown, chmod, umask, etc.
  
  A casual grep through POE's source would reveal that it doesn't do any
  of this.
 
 I looked briefly at UserBase.pm, because it seems to have something to
 do with security.  I came up with a few questions which weren't easily
 resolved.  There are probably good answers to all my questions, but
 I'm a fairly experienced programmer and my casual observations didn't
 find them.  I wouldn't find easy answers for Apache either, but I
 *trust* Apache from its reputation alone.  That's the best I can do,
 and that's what I've been arguing about.

[...]

UserBase is a third-party module using POE, but it's not part of POE
itself.  The relationship between the two is similar to the one
between J. Random CPAN Module and Perl.

Your comments are very useful, though.  Thank you.  I'll forward them
to the module's author.

-- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Perrin Harkins
Bas A.Schulte wrote:
 Quite odd. I read the performance thread that's on the P5EE page which
 showed that DBI (with MySQL underneath) was very fast, came in 2nd.
 Anyone care to elaborate why this is? After all, shared-memory is a
 thing in RAM, why isn't that faster?

I have an article that I'm working which explains all of this, but the 
short explanation is that they work by serialzing the entire memory 
structure with Storable and stuffing it into a shared memory segment, 
and even reading it requires loading and de-serializing the whole thing. 
  IPC::MM and the file-based ones are much more granular.  Also, file 
systems are very fast on modern OSes because of efficient VM systems 
that buffer files in memory.

 I'm not saying I want entity beans here ;) It's just that I've been
 doing perl to pay for bills and stuff the past few years and see a lot
 of people having some (possibly perceived?) need for something missing
 in perl.

It may be that they just want someone to tell them how they should do
things.  J2EE does provide that to a certain degree.

 If I read your mail, you mention some solutions/directions for some
 problems I'm dealing with, but that's just my issue (I think; it's just
 coming to me): we have a lot of raw metal but we do have to do a lot
 of welding and fitting before we can solve our business problems.
 
 That is basically the point.

I don't think it's nearly that bad.  After my eToys article got 
published, I got several e-mails from people saying something like we 
want to do this, but our boss says we have to buy something because of 
all the INFRASTRUCTURE code we would have to write.

Infrastructure?  What infrastructure?  The only stuff we wrote that was 
really independent of our application logic were things like a logging 
class and a singleton class, which can now be had on CPAN.  We wrote our 
own cache system, but that's because it worked in a very specific way 
that the available tools didn't handle.  I think I could do that with 
CPAN stuff now too.

 To illustrate that, I'll try to give a real-world example

Thanks, it's much easier to talk about specific situations.

 To deliver these messages, I send them off to another server (using my
 own invented pseudo-RMI to call a method on that server).

I would use HTTP for that, because I'm too lazy to write the RMI code 
myself.

 1. The server that does the delivery has plenty of threads (er, a
 Apache/mod_perl child) so I hope I have enough of them to deliver the
 messages at the rate the backend server generates them: one child might
 take up to 5 seconds to deliver the message but there are plenty childs.

 Not good. I've seen how this works and miserably fails when a delivery
 mechanism barfs.

If they were so quick to process that you could do it that way, I would
have just handled them in the original mod_perl server with a
cleanup_handler.  Obviously they are not, so that's not an option here.

 2. Same as 1 but I never allow one delivery mechanism to use all my
 Apache/mod_perl children by adding some form of IPC (darned, need to
 solve my data sharing issues first!)

I think they are already solved if you look at the modules I suggested.

 so the children check what the
 others are currently doing: if a request comes in for a particular
 delivery mechanism, I check if we're already doing N delivery attempts
 and drop the request somewhere (database/file, whatever) if not. I have
 a daemon running that monitors that queue.

I would structure it like this:
- Original server takes request, and writes it to a database table that
holds the queue.
- A cron job checks the queue for messages, reads the status from
MLDBM::Sync to see if we have free processes, and passes the request to
mod_perl if we do.  (Not that this could also be done with something
like PersistentPerl instead.)  If there are no free processes, they are
left on the queue.

 That daemon gets complicated quickly as it also has to throttle delivery
 attempts

My approach only puts that logic in the cron job.

 I need some form of persistent storage (with locking)

The relational database.  Or MLDBM::Sync if you prefer.

 what do
 I do when the delivery mechanism has failed for 6 hours and I have 12000
 messages in the queue *and* make sure current messages get sent in time?

I don't know, that's an application-specific choice.  Of course JMS
doesn't know either.

 3. I install qmail on the various servers, and use that to push messages
 around. This'll take me a week or so (hopefully) to get it running
 reliably in production

One of the major selling points for qmail is easier setup.  You could
use pretty much any mail server though if you have more experience with
something else.  I just like qmail because it's fast.

 Later on, I
 realise that for each messages, a fullblown process is forked *per
 message*: load up perl, compile perl code etc..

I described how to avoid this in another message: use PersistentPerl or
equivalent, or pass 

Re: implementing a set of queue-processing servers

2002-11-25 Thread Rob Nagler
Matt Sergeant writes:
 There's a huge difference in what they are trying to achieve though. 
 POE doesn't open any files and it doesn't write any files to disk. None 
 of it is written in C (yet), so unless there's a buffer overrun or type 
 mismatch bug in perl you can exploit, you're not going to get in that 
 way.

I agree that Perl is a safe language (independent of taint, which
adds safety).  Unfortunately, there has been a history of insecure Perl
programs (formail.pl, I think being the most famous).  This may be
a consequence of bad programming, but you have to look at the
average if you are selecting a system without reviewing every line of
code, i.e., performing a security audit.

I trust Linux more than Apache, for example, because Linux is not only
older, but was built using an interface design which is 30 years old
and has been allowed to evolve.

 I'm not honestly suggesting it's bug free, but I fail to see how a bug 
 in POE would give you access to the system.

Use of a user string incorrectly in a system or open might do it.
Also, an incorrect chown, chmod, umask, etc.

 Now user code written on top of POE (or Apache) is another matter 
 altogether.

:)

Rob






Re: web security (was implementing a set of queue-processing servers)

2002-11-25 Thread Rob Nagler
Gunther Birznieks writes:
 I am not sure it is a bad example. It is an extreme example, so 
 therefore biased, but Apache is also a biased project because of 
 Apache's role in the Web.

Agreed.

 That's true. But if you have a collocation facility, you also don't have 
 an intranet on the other side like host based system. I don't really 
 consider collocated servers enterprise in the sense of having to link 
 with real systems like Sabre or Funds Transfer for a bank account, 
 medical records lookup, etc...

If you start looking around, you'll find a lot of companies are
trusting collocation facilities for large financial transactions.

 There really isn't a practical way to host these in a colocated facilty 
 and still claim the same level of security you could architect otherwise.

So that's why Exodus went out of business!  ;-)

I think the problem of software security can be solved without
considering physical access.  Also, most corporate computer rooms are
probably less secure than commercial collocation facilities--at least
from my experience.

 But then it depends on your risk level. Personally, I like allowing VPN 
 in (eg SSH) for my own convenience. But I've yet to run into a bank (for 
 example) or large corporate with similar types of systems that allow SSH 
 in. Some even don't allow SSH out because they fear it as a channel 
 through which large amounts of proprietary data can be transferred by 
 internal employees.

But they run Wi-Fi without a problem. :-)

 The norm I think is to find SSH coming in from outside to be verboten 
 and SSH coming to the server from inside to be grudgingly OK and 
 usually only allowed through the firewall from some specific operational 
 hosts.

You are correct, but I don't think this actually solves a security
problem.   Otherwise, most companies wouldn't have problems with
virii.   They do, and that's the type of attack we are most likely to
run into.

 client to allow us that convenience. I've run into many dot.com startups 
 that allow us that convenience, but never a larger corporate especially 
 if their web services are more complex (eg granting limited access to 
 medical records).

On one job I had a large medical clinic *email* medical records for
test data.  These were real people's records.  I was shocked, but not
for long.  I have found that most IT policies are porous, esp. at the
top.  Consultants come in with their own laptops.  I've done this on
numerous occasions at large companies.

The point is that while some of the IT security policies stop a
certain amount of nuisance security problems, they don't normally
prevent crackers.  Usually, once inside, you can *telnet* from machine
to machine.  Never mind SMB.

 I think this is reasonable for a co-location center. But not for an 
 enterprise that has it's own webserver. If the only thing protecting the 
 webserver from the intranet is iptables on the webserver itself, what 
 happens when someone breaks into the webserver? The next thing to go 
 would be iptables and then the machine is exposed to the rest of the 
 intranet.

Well, that's where good sysadmin comes in, and why you sandbox apache
(run it as a non-root user).

 You would need a machine between the web server and the intranet to 
 block access definitively.

This is impossible.

 I think iptables/ipchains is one of the coolest things to get free with 
 linux compared to other OSes, but also if the machine is a public 
 service machine and someone breaks into that public service, they can 
 disable security features for sure.

Yes, and Cisco installed IIS in its DSL routers.  Can you say Nimda?
Cisco is not perfect, and neither is Linux.

 If the FW is separate, even if they break into the web server, the 
 cracker can't all of a sudden open up a lot of other ports such as 
 conveniently allowing the web server to listen to telnet and FTP and 
 installing those services. The cracker would have to either disable the 
 web server and install FTP to listen on port 80 (thereby obviously 
 disrupting service) or install some CGI's that do the equivalent (which 
 won't be as convenient).

This doesn't make sense.  If Apache or POE is cracked and can run
arbitrary code, then anything can *go* *out* from the inside.  This
reverse tunneling is pretty much what these DDoS do.  A FW usually
can't prevent opening port 80 on a remote server.  That's all you need
to spread or attack.

 There are even those who advocate putting in two different firewalls 
 between the layers so if a bug is found in one firewall, the other 
 firewall will still hold the rules up. The logic is that the likelihood 
 of both firewalls having an exploit discovered at the same time is 
 extremely unlikely.

A good idea.  However, you also increase your risk when the software
runs on a general-purpose machine.  I think that's what we're talking
about: POE vs Apache on a server.  Multi-layered firewalls are fine,
but they can't stop a compromised machine from doing 

Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)

2002-11-22 Thread Matt Sergeant
On Friday, Nov 22, 2002, at 02:49 Europe/London, Gunther Birznieks 
wrote:

I disagree. I think it depends on the protocol. A well designed 
protocol for an application will spread and stand the test of time. 
Sometimes the protocol doesn't have to be well designed, but just that 
it's standard can help tremendously.

eg if we were a world that said HTTP is it and we should do 
everything over HTTP, then would you really see SMTP over HTTP? SNMP 
over HTTP? telnet over HTTP? Why?

This doesn't really make sense to me.

[OT, because I know this isn't really your point]

As someone who's entire job revolves around SMTP these days, I'd love 
to see mail go over HTTP. SMTP's got no concept of negotiation. It's 
got little in the way of versioning (HELO vs EHLO). It's got no 
permanent redirect (e.g. [EMAIL PROTECTED] is now 
[EMAIL PROTECTED]). It's got very weak handling of binary 
data. Writing mail server plugins is very non-standardised.

Don't get me wrong, SMTP is a great protocol, but HTTP is sometimes 
just *so* much nicer :-)

Matt.



Re: web security (was implementing a set of queue-processing servers)

2002-11-22 Thread Gunther Birznieks


Rob Nagler wrote:


 

This isn't because more eyes looked at postfix than sendmail, but that 
the eye that designed postfix was a security-minded eye and his 
friends who are also security minded also likely had a hand in audit.
   


Sendmail is a bad example. ;-) I agree that quality does make a
difference.  I'm speaking about averages.
 

I am not sure it is a bad example. It is an extreme example, so 
therefore biased, but Apache is also a biased project because of 
Apache's role in the Web.

 

What I mean is that if this were a secure site, you would never allow 
SSH to come in from the outside layers to the progressively internal 
layers. Connections should only be allowed from inside out.
   


When all I have is a collocation facility, there's no choice.  I've
got to come in through the front-end.
 

That's true. But if you have a collocation facility, you also don't have 
an intranet on the other side like host based system. I don't really 
consider collocated servers enterprise in the sense of having to link 
with real systems like Sabre or Funds Transfer for a bank account, 
medical records lookup, etc...

There really isn't a practical way to host these in a colocated facilty 
and still claim the same level of security you could architect otherwise.

So if you have a separate firewall protected zone for the web
server,
   


Are you saying that the firewall protects your network, and defines
that as inside?
 

Well, not necessarily THE firewall, but a firewall or group of firewalls 
as external entities protecting access to and from various network 
partitions. Sometimes people call these DMZ or multi-DMZ, and others 
will say that the word DMZ is completely incorrect because each network 
segment really has it's own rules.

 

The only thing the web server should have access to is the protocol and 
port to access the app server.
   


You have to be able to login.  I don't see how you would administer it
otherwise?
 

In an enterprise system, your operators would be on the inside of the 
LAN and be able to go from inside out. You are talking as if you are a 
3rd party vendor or the collocation and therefore you have to go from 
outside in. Outside in is always going to be less secure.

But then it depends on your risk level. Personally, I like allowing VPN 
in (eg SSH) for my own convenience. But I've yet to run into a bank (for 
example) or large corporate with similar types of systems that allow SSH 
in. Some even don't allow SSH out because they fear it as a channel 
through which large amounts of proprietary data can be transferred by 
internal employees.

The norm I think is to find SSH coming in from outside to be verboten 
and SSH coming to the server from inside to be grudgingly OK and 
usually only allowed through the firewall from some specific operational 
hosts.

Anyway, again, different hosts have different issues. For myself, just 
as you, I prefer convenience. But very few corporate clients I have 
allow us that convenience. In fact, I've never run into a corporate 
client to allow us that convenience. I've run into many dot.com startups 
that allow us that convenience, but never a larger corporate especially 
if their web services are more complex (eg granting limited access to 
medical records).

And many times they are because they are useful. It's pretty rare to 
find a bare Apache.
   


If we are talking about enterprise systems, they had better be bare,
or the programmers/admins are not very good at what they do.  There's
no need to run inetd, popd, etc. on most systems.
 

By not bare I meant the Apache itself. For example, I don't think it's 
uncommon to find mod_proxy, mod_rewrite and mod_ssl on an Apache exposed 
as the front-end at minimum.

   

In a 3 tier application where the access to the app server and DB server 
are also protected by FWs then if a cracker cracks the Apache web 
server, they fact that they have to crack the app server which is 
running a separate set of code (eg POE) is going to be a major
hinderence.
   


I like this quote:

   In the early 1990s, firewall pioneer Bill Cheswick described the
   network perimeter where he worked at Bell Labs as having a
   crunchy shell around a soft, chewy center.

We don't have any firewalls.  All machines run ipchains or iptables.
They run minimal configurations.  We only allow encrypted access
except for public Web servers.  Firewalls are a crutch for bad
security.  Your network has to be composed of jawbreakers.
 

I think this is reasonable for a co-location center. But not for an 
enterprise that has it's own webserver. If the only thing protecting the 
webserver from the intranet is iptables on the webserver itself, what 
happens when someone breaks into the webserver? The next thing to go 
would be iptables and then the machine is exposed to the rest of the 
intranet.

You would need a machine between the web server and the intranet to 
block access definitively.

I think iptables/ipchains is one 

Re: web security (was implementing a set of queue-processing servers)

2002-11-21 Thread Rob Nagler
Gunther Birznieks writes:
 That will surely be easier than figuring our the vulnerabilities for 
 myself. Allowing an exploit to be posted will let me be a part-time 
 cracker and all I need to do is wait with a skeleton of injection code, 
 ready to strike when the exploit is publicized. But in the latter case 
 (finding vulnerabilities myself), I will likely have to make it my full 
 time job (either that or I would have to be a high school/college 
 student :))

I agree, but I think we have digressed into the realm of
motivation...

 Whereas, you will much less likely see a security audit on someone's EJB 
 or POE server because it is not on the front-line.

I'm sure Arthur Andersen sells such auditing services. :-)

 This isn't because more eyes looked at postfix than sendmail, but that 
 the eye that designed postfix was a security-minded eye and his 
 friends who are also security minded also likely had a hand in audit.

Sendmail is a bad example. ;-) I agree that quality does make a
difference.  I'm speaking about averages.

 What I mean is that if this were a secure site, you would never allow 
 SSH to come in from the outside layers to the progressively internal 
 layers. Connections should only be allowed from inside out.

When all I have is a collocation facility, there's no choice.  I've
got to come in through the front-end.

 So if you have a separate firewall protected zone for the web
 server,

Are you saying that the firewall protects your network, and defines
that as inside?

 The only thing the web server should have access to is the protocol and 
 port to access the app server.

You have to be able to login.  I don't see how you would administer it
otherwise?

 And many times they are because they are useful. It's pretty rare to 
 find a bare Apache.

If we are talking about enterprise systems, they had better be bare,
or the programmers/admins are not very good at what they do.  There's
no need to run inetd, popd, etc. on most systems.

 But in any case, I think my first and primary point has also been lost.

Never!  I treasure it.

 In a 3 tier application where the access to the app server and DB server 
 are also protected by FWs then if a cracker cracks the Apache web 
 server, they fact that they have to crack the app server which is 
 running a separate set of code (eg POE) is going to be a major
 hinderence.

I like this quote:

In the early 1990s, firewall pioneer Bill Cheswick described the
network perimeter where he worked at Bell Labs as having a
crunchy shell around a soft, chewy center.

We don't have any firewalls.  All machines run ipchains or iptables.
They run minimal configurations.  We only allow encrypted access
except for public Web servers.  Firewalls are a crutch for bad
security.  Your network has to be composed of jawbreakers.

 However, even if you are thinking a cracker will discover 
 vulnerabilities from scratch, and you think it is easy to do so on 
 POE, I think you are still majorly hindering the cracker by having POE 
 exist.

AKA, security through obscurity.  It works, but I also think it breeds
bad designs, just like sessions in Web servers lead to laziness and
bad designs.

 In summary, I think it is much more plausible that your DB server (or 
 mainframe host) is toast if all you have as a layer in front of it is 
 Apache than if you have an App Server layer between the two.

Which is why we encrypt all critical data in our DB, and we start our
Web servers by hand with a long key.

Rob





Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-20 Thread Perrin Harkins
Aaron Johnson wrote:


This model has eased my testing as well since I can run the script
completely external of the web server I can run it through a debugger if
needed.



You realize that you can run mod_perl in the debugger too, right?  I use 
the profiler and debugger with mod_perl frequently.

- Perrin



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-20 Thread Perrin Harkins
Aaron Johnson wrote:


I know you _can_ , but I don't find it convenient.



For me it's pretty much the same as debugging a command-line script.  To 
debug a mod_perl handler I just do something like this:

httpd -X -Ddebug

Then I hit the URL with a browser or with GET and it pops me into the 
debugger.  I have httpd.conf set up to add the PerlFixupHandler 
+Apache::DB line when it sees the debug flag.

I still don't like to give apache long processes to manage, I feel this
can be better handled external of the server and in my case it allows
for automation/reports on non-mod_perl machines.



I try to code it so that the business logic is not dependent on a 
certain runtime environment, and then write a small mod_perl handler to 
call it.  Then I can use the same modules in cron jobs and such.  It can 
get tricky in certain situations though, when you want to optimize 
something for a long-running environment but don't want to break it for 
one-shot scripts.

- Perrin



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-20 Thread Rob Nagler
Perrin Harkins writes:
 I try to code it so that the business logic is not dependent on a 
 certain runtime environment, and then write a small mod_perl handler to 
 call it.

I've been doing a lot of test-first coding.  It makes it so that you
start Apache, and the software just runs.  With sufficient granularity
of unit tests, we find that we don't use the debugger.  Run the test,
and it tells you what's wrong.

Rob





Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)

2002-11-20 Thread Rob Nagler
Gunther Birznieks writes:
  In the context of what you are saying, it seems as if everyone should 
 just stick to using TCP/IP/Telnet as a protocol and then the world would 
 be a better place.

Once upon a time, there was OSI, SNA, DECnet, etc.  Nowadays, all
computers talk IP, even if you connect from AOL.  Yes, the other
protocols are still around, but nobody in their right mind would
recommend them anymore.

 But I don't think this is so. Everyone ends up creating their own 
 protocols, their own algorithms on top of TCP on how to communicate.

Because it's FUN, and you probably can get a Ph.D. thesis out of it. ;-)

 In 
 a way it is simpler because you just have the freedom to create whatever 
 you want. But in another way, it is a nightmare because everyone will 
 just implement their own way of doing things. This can be OK in some 
 contexts, but I find it difficult to believe that this is the best thing 
 overall.

I'm not advocating this.  Rather, I am recommending using a
well-known, and arguably the most widely-used protocol:
application/x-www-form-urlencoded--and it's near cousin
multipart/form-data.  However, that's messy, we can just call it HTTP,
and our implementation is LWP and Apache.

 At least with J2EE, for every major standard or protocol
 implemented, there is only one way to do it.  With Perl, you
 actually have more confusion because there are many more ways to do
 it. More ways to do templating, more ways to do middleware, more
 ways to do serialization of objects, etc...

There are equivalent number of ways in both languages.  If you are
saying that you could build a standard component in, say, EJB, and
sell it, well, that's just not the case.  That's the pipe dream of
CORBA.  The only thing close to portable protocols is HTTP.  Sabre,
for example, gives you a library, and you have to interface to it.
However, authorize.net's interface is HTTP, and I can write my own
library in  100 lines of Perl, which matches my application, and
doesn't require me to install anything.

There's such a thing as standard protocols, but every application uses
them differently.

Rob






Re: implementing a set of queue-processing servers

2002-11-20 Thread Rob Nagler
Gunther Birznieks writes:
 If you had an Apache server and a POE app server, what would a cracker 
 have an easier time trying to get in?

Assuming up-to-date code, POE, for sure.

 Probably the Apache server. Once broken through the Apache server, the 
 cracker would have to figure out that it is indeed a POE server on the 
 other end, and then to figure out an exploit by just trying as many 
 things as they can. ie they'd have to do a lot of extra work rather than 
 utilizing a public knowledge exploit someone else discovered.

All public knowledge exploits of Apache are fixed within days if not
hours.  It's the private ones I worry about.  There have to be more
of these in POE than Apache.  The more eyes, the fewer the defects.

 How? Why would any firewall admin allow SSH access from the outside 
 world to poke progressively inwards through the protected zones?

When we want to get to the middle tiers, we go in through the front
ends.  You need passwords at every level.  I'm not sure what you mean
here.

 I think this is correct. But as most servers that are transactions have 
 mod_ssl, I kind of consider mod_ssl and other modules as being fairly 
 core to Apache.

They have to be configured to be exploited.

Rob





asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Stephen Adkins
At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote:

We digress.  The problem is to build a UI to Sabre.  I still haven't
seen any numbers which demonstrate the simple solution doesn't work.
Connecting to Sabre is no different than connecting to an e-commerce
gateway.  Both can be done by connecting directly from the Apache
child to the remote service and returning a result.

Hi,

My question with this approach is not whether it works for synchronous
execution (the user is willing to wait for the results to come back)
but whether it makes sense for asynchronous execution (the user will
come back and get the results later).

In fact, we provide our users with the option:

   1. fetch the data now and display it, OR
   2. put the request in a queue to be fetched and then later displayed

We have a fixed number of mainframe login id's, so we can only run a
limited number (say 4) of them at a time.

So what I think you are saying for option 2 is:

   * Apache children (web server processes with mod_perl) have two
 personalities:
   - user request processors
   - back-end work processors
   * When a user submits work to the queue, the child is acting in a
 user request role and it returns the response quickly.
   * After detaching from the user, however, it checks to see if fewer
 than four children are processing the queue and if so, it logs into
 the mainframe and starts processing the queue.
   * When it finishes the request, it continues to work the queue until
 no more work is available, at which time, it quits its back-end
 processor personality and returns to wait for another HTTP request.

This just seems a bit odd (and unnecessarily complex).
Why not let there be web server processes and queue worker processes
and they each do their own job?  Web servers seem to me to be for
synchronous activity, where the user is waiting for the results.

Stephen

P.S. Another limitation of the use Apache servers for all server processing
philosophy seems to be scheduled events or system events (those not
initiated by an HTTP request, which are user events).

example: Our system allows users to set up a schedule of requests to be run.
i.e. Every Tuesday at 3:00am, put this request into the queue.
This is a scheduled event rather than a user event.
How is a web server process going to wake up and begin processing this?
(unless of course everyone who puts something into the queue must send
a dummy HTTP request to wake up the web servers)






Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Perrin Harkins
Stephen Adkins wrote:


So what I think you are saying for option 2 is:

   * Apache children (web server processes with mod_perl) have two
 personalities:
   - user request processors
   - back-end work processors
   * When a user submits work to the queue, the child is acting in a
 user request role and it returns the response quickly.
   * After detaching from the user, however, it checks to see if fewer
 than four children are processing the queue and if so, it logs into
 the mainframe and starts processing the queue.
   * When it finishes the request, it continues to work the queue until
 no more work is available, at which time, it quits its back-end
 processor personality and returns to wait for another HTTP request.





This just seems a bit odd (and unnecessarily complex).



It does when you put it like that, but it doesn't have to be that way. 
I would separate the input (user or queue) from the processing part. 
You'd have a module that runs in mod_perl which knows how to process 
requests.  You have a separate module which can provide a UI for placing 
 requests.  Synchronous ones go straight to processing, while asynch 
ones get added to the queue.

You'd also have a controlling process that polls the queue and if it 
finds anything it uses LWP to send it to mod_perl for handling.  I would 
make this a tiny script triggered from cron if possible, since cron is 
robust and can handle outages and error reporting nicely.

Why not let there be web server processes and queue worker processes
and they each do their own job?  Web servers seem to me to be for
synchronous activity, where the user is waiting for the results.



When I think of queue processing, I think of a system for handling tasks 
in parallel that provides a simple API for plugging in logic, a 
well-defined control interface, logging, easy configuration... sounds 
like Apache to me.  You just need a tiny control process to trigger it 
via LWP.  Apache is already a system for handling a queue of HTTP 
requests in parallel, so you just have to make your requests look like HTTP.

You certainly could do this other ways, but you'd probably have to write 
a lot more code or else use something far less reliable than Apache.

P.S. Another limitation of the use Apache servers for all server 
processing
philosophy seems to be scheduled events or system events (those not
initiated by an HTTP request, which are user events).


Cron/at + LWP.

- Perrin




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Valerio_Valdez Paolini

Hi Stephen,

On Tue, 19 Nov 2002, Stephen Adkins wrote:

 My question with this approach is not whether it works for synchronous
 execution (the user is willing to wait for the results to come back)
 but whether it makes sense for asynchronous execution (the user will
 come back and get the results later).

What kind of interface will you provide to the final users?

 In fact, we provide our users with the option:

1. fetch the data now and display it, OR
2. put the request in a queue to be fetched and then later displayed

 We have a fixed number of mainframe login id's, so we can only run a
 limited number (say 4) of them at a time.

So it is possible that an immediate request is queued if the system has
already reached its maximum allowed logins. In other words, a final user
can request to display data immediately but your middleware can answer that
the request has been queued, possibly saying 'your job id, position n, see
you later'.

Moreover, you must preserve order of requests. And if I recall correctly you
talked about a sort of queue listing and some job manipulation. Whatever
will be your choice, you undoubtedly need to serialize requests and enqueue
them using a dbms. It is the simplest approach.

Given a method to add, list or remove requests from this kind of queue,
mod_perl (even plain cgi scripts) can use these method to manipulate a
user's job. Using access control supplied by Apache, it is possible to give
different access rights to users of the middleware.

Requests from final users will be always enqueued by an Apache children,
that will get a job-id and its position in the queue. If the job is on top
of the queue, you will immediately wait for its completion. Otherwise you
can tell the user to check his job queue later.
Users can remove jobs from the queue.

All completed jobs will be stored somewhere (file system and/or db) and can
be listed by legitimate users. Jobs completed will show in a separate queue.

An external entity will dequeue jobs and process them, probably using
something like Parallel::ForkManager to limit concurrent requests.

Another entity will enqueue recurring jobs. Jobs scheduled for future
processing should always be enqueued immediately, or I can't imagine
a coherent interface to remove jobs.

These entities look like daemons, that can be spawned and controlled using
code executed by Apache.

Please note that I never mentioned html, using Apache as your
infrastructure you can build whatever interface you need.

Requests recorded inside db can also be used to implement a cache,
probably reused by following requests.
It would be possible to collapse identical requests (to save logins).

Obviously it is possible to replace Apache with POE or Stem, but I don't
know how, sorry. There are many other solutions, but this sketch describes
my way to do it. Sorry for the length of this message.

Ciao, Valerio


 Valerio Paolini, http://130.136.3.200/~paolini
--
 Linux, the Cheap Chic for Computer Fashionistas




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Aaron Johnson
On Tue, 2002-11-19 at 16:28, Stephen Adkins wrote:
 At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote:
 
 We digress.  The problem is to build a UI to Sabre.  I still haven't
 seen any numbers which demonstrate the simple solution doesn't work.
 Connecting to Sabre is no different than connecting to an e-commerce
 gateway.  Both can be done by connecting directly from the Apache
 child to the remote service and returning a result.
 
 Hi,
 
 My question with this approach is not whether it works for synchronous
 execution (the user is willing to wait for the results to come back)
 but whether it makes sense for asynchronous execution (the user will
 come back and get the results later).
 
 In fact, we provide our users with the option:
 
1. fetch the data now and display it, OR
2. put the request in a queue to be fetched and then later displayed
 
 We have a fixed number of mainframe login id's, so we can only run a
 limited number (say 4) of them at a time.
 
 So what I think you are saying for option 2 is:
 
* Apache children (web server processes with mod_perl) have two
  personalities:
- user request processors
- back-end work processors
* When a user submits work to the queue, the child is acting in a
  user request role and it returns the response quickly.
* After detaching from the user, however, it checks to see if fewer
  than four children are processing the queue and if so, it logs into
  the mainframe and starts processing the queue.
* When it finishes the request, it continues to work the queue until
  no more work is available, at which time, it quits its back-end
  processor personality and returns to wait for another HTTP request.
 
 This just seems a bit odd (and unnecessarily complex).
 Why not let there be web server processes and queue worker processes
 and they each do their own job?  Web servers seem to me to be for
 synchronous activity, where the user is waiting for the results.
 

I am doing something similar right now in a project.  It has to make
approx. 220 requests to outside sources in order to compile a completed
report. These reports vary in time to create based on the data sources
and network traffic.  This is the solution I have in place currently:

1) User visits web page (handled by mod_perl) and they make the request
for a report.

2) The request parameters are stored into a temp file and the user is
redirected to a wait page.  The time spent of the wait page varies and
an approx time is created based on query complexity. The user session is
given a key that is that matches the temp file name.

3) A separate dedicated server (Proc::Daemon based) picks up the temp
file and spawns a child to process it.  This daemon looks for new temp
files every X seconds, were X is 15 seconds, but it could easily be
adjusted.  It keeps a queue of the temp files that have been processed
and drops them from the queue after 45 minutes even if they haven't run.

4) The child recreates the users object and runs the report, when it
completes it deletes the temp file.  If it fails to complete the temp
file remains.

5) When the auto refresh takes place the system determines if the users
request has completed by looking for the temp file named in their
session data.  If the file exists they are given another wait page with
a 30 to 120 second wait time.  If it doesn't exist then the cached
information from the report, just an XML file created from a XML::Simple
dump of the hash containing the report data, is processed and presented
as HTML to the user.

I had attempted using a mod_perl only solution, but I didn't like tying
up the server with additional processing that could be handled
externally.  This method also allows for the server script to reside on
a separate machine (allowing for some shared filesystem samba, NFS etc)
without having to recreate an entire mod_perl environment.

This model has eased my testing as well since I can run the script
completely external of the web server I can run it through a debugger if
needed.  I also use the same script for nightly automated common reports
to limit the number of real time requests since the data doesn't can
that frequently in my case.

 Stephen
 
 P.S. Another limitation of the use Apache servers for all server processing
 philosophy seems to be scheduled events or system events (those not
 initiated by an HTTP request, which are user events).
 

I agree with Perrin, you can use LWP to emulate a users HTTP request if
you want to use an HTTP style request. 

cron/at represents the best way to handle this (IMHO).  In my case I run
the cron job and it generates the temp files, these temp files get
picked up by the looping server (simple non mod_perl daemon) and
processed. So I don't use LWP, but could send the request to the server
and have it create the temp files just as easily, I just happen to have
the logic abstracted to where I don't need to involve the mod_perl.

Aaron 

Re: implementing a set of queue-processing servers

2002-11-19 Thread Gunther Birznieks


Rob Nagler wrote:


   

My experience is just the opposite.  If you reuse code, most servers
contain that code base and are therefore large relative to very
specific applications.  Most of our mod_perl servers are 15MB minimum,
and grow to up to 80MB.
 

But what if the code is not meant to be reused except within the context 
of what you are processing. But from what you are saying, I suppose I 
would agree that the integration point with Sabre is pretty much 
straight through and not much custom logic.

The problem is sharing, routing. load balancing, etc.  If you run
separate processes, there's no chance to share.  If you run separate
processes, you need more configuration, documentation thereof, and
design for peak load becomes more difficult.


I think that is a good point.


 

Maybe this could be segmented using a reverse proxy to make the 
difference between whether it goes to a mod_perl process that talks to 
Sabre and one that does other app stuff.
   


This stuff is handled on the back end.  I don't see where proxies
would help.
 

I had mentioned the above in the context of providing an alternative in 
case there is no backend.

But alternatively, splitting it 
out so that the Perl code that does the logic of talking to Sabre and 
massaging that for a mod_perl app is stored in POE or PerlRPC, then it 
would be better to have the 5 middleware processes dealing with the 
shared memory stuff.
   


POE and PerlRPC are fatter than the Sabre code, I bet.
 

Yeah, I think you are right. I've not used POE, PerlRPC is fairly thin 
but you are right that it is fatter. So I think this is also a good point.

Apache is better than IIS but I would not call it secure in an 
absolute sense. There have been plenty of exploits for Apache over the 
last year that give me headaches having to patch ASAP when discovered.
   


Is POE or PerlRPC more secure than Apache? I seriously doubt it.
 

That could be, but how many exploits of POE or PerlRPC or core Perl 
(which would also be exploitable) has been posted on Bugtraq in the last 
year. Very few if any (zero I think?). What about Apache? Definitely Not 
Zero.

This doesn't mean that POE is more secure than Apache, but it does mean 
that there are less publicized exploits. If POE became a popular 
webserver, certainly more people would be trying to break it actively 
and perhaps they would find more exploits.

So...

If you had an Apache server and a POE app server, what would a cracker 
have an easier time trying to get in?

Probably the Apache server. Once broken through the Apache server, the 
cracker would have to figure out that it is indeed a POE server on the 
other end, and then to figure out an exploit by just trying as many 
things as they can. ie they'd have to do a lot of extra work rather than 
utilizing a public knowledge exploit someone else discovered.

 

If you are a cracker and have hacked someone's Apache, but then your 
next crack has to find an exploit in a daemon written in Perl like POE 
before finally getting to the database or backend system, you are still 
slowing down your attacker. Usually at worst, the attacker will have to 
figure out something about how POE works.
   


The cracker will go to the OS.  They aren't going to proxy-hop.
They'll try a telnet, ssh, dns, etc. exploit once they are on the
inside.
 

Sure if you run them on the same DMZ or the same server .The assumption 
is the application server is in a separate zone which only allows 
requests in from the web server and requests out to the DB server or 
other resource. If security is a concern, I wouldn't see someone dumping 
all the code they are running and the database on the same machine cuz 
then if they get into Apache, they have the keys to the kingdom.

I believe more script kiddies/casual crackers can probably log into 
sybase, oracle, mysql databases and trash them than they can figure out 
how to talk to an RMI engine, EJB server, SOAP, or POE middleware for an 
application layer prior to accessing the database.
   


If this is a large scale app, there will be front-ends, middle tiers,
and databases.  If they crack ssh, they're through the system.  If


How? Why would any firewall admin allow SSH access from the outside 
world to poke progressively inwards through the protected zones?

they crack Apache, they still have to exploit the specific attack.
Middle tiers run mod_perl, but not mod_proxy and mod_ssl.  Front ends
run mod_proxy and mod_ssl, but not mod_perl.  The cracks on Apache
have all been on specific modules, not on the Apache core.  The weak
link is not Apache or the middleware, because the connections as you
point out are too complex.
 

I think this is correct. But as most servers that are transactions have 
mod_ssl, I kind of consider mod_ssl and other modules as being fairly 
core to Apache.

We digress.  The problem is to build a UI to Sabre.  I still haven't


It is a digression, but also an important one. Security of external and 

Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)

2002-11-19 Thread Gunther Birznieks


Rob Nagler wrote:


The antithesis of this is J2EE, which introduces an amazing amount of
complexity through protocol explosion (is it a Message/Session/Entity
Bean, do I use JMX, JMS, RMI, etc.).  It creates tremendous confusion,
and their software is certainly less reliable than Apache.

 

I think this is not a fair statement about J2EE (except the less 
reliable part).

In the context of what you are saying, it seems as if everyone should 
just stick to using TCP/IP/Telnet as a protocol and then the world would 
be a better place.

But I don't think this is so. Everyone ends up creating their own 
protocols, their own algorithms on top of TCP on how to communicate. In 
a way it is simpler because you just have the freedom to create whatever 
you want. But in another way, it is a nightmare because everyone will 
just implement their own way of doing things. This can be OK in some 
contexts, but I find it difficult to believe that this is the best thing 
overall.

At least with J2EE, for every major standard or protocol implemented, 
there is only one way to do it. With Perl, you actually have more 
confusion because there are many more ways to do it. More ways to do 
templating, more ways to do middleware, more ways to do serialization of 
objects, etc...





Re: implementing a set of queue-processing servers

2002-11-18 Thread Valerio_Valdez Paolini

On Sat, 16 Nov 2002, Stephen Adkins wrote:

 Are you also interested in fault tolerance and accuracy of computation?
 And what about caching?

 accuracy of computation? of course. but this would seem to me to be
 a matter of program logic.

 caching? not sure what you mean, but caching is good if it increases
 performance without creating synchronization problems.

Your last messages clarified your situation and needs, so I think caching
is not interesting.

Btw, what you described is covered in some architectural design patterns
that I was studying when your question arrived in my mailbox.

Ciao, Valerio

 Valerio Paolini, http://130.136.3.200/~paolini
--
 Linux, the Cheap Chic for Computer Fashionistas




Re: implementing a set of queue-processing servers

2002-11-18 Thread Gunther Birznieks


Rob Nagler wrote:


Gunther Birznieks writes:
 

Also, I suspect it probably wouldn't be efficient memory wise. mod_perl 
processes are large enough with front-end code without randomly having 
them share a bunch of middleware/mainframe processing code also. This 
middleware code could probably be more tightly shared amongst a smaller 
number of processes that just service the mainframe stuff.
   


The sharing should be identical, esp. if the code is written in C or
C++ which most middleware (and probably Sabre) is written in.
 

1) Likely I would think there will be massaging of data relative to the 
application at hand, so it wouldn't be a pure C wrapper
2) Data slowly corrupts Code in Perl, so shared memory breaks down after 
awhile.

For these two reasons, it seems to me, that you won't really get that 
much shared memory so it would still be better to limit the code to as 
few engines as possible instead of the universe of engines that the 
application can access.

so if 30 mod_perl engines are needed for the application, but only 5 at 
any given time are accessing the reservation system, then only 5 engines 
should be going to the reservation system.

Maybe this could be segmented using a reverse proxy to make the 
difference between whether it goes to a mod_perl process that talks to 
Sabre and one that does other app stuff. But alternatively, splitting it 
out so that the Perl code that does the logic of talking to Sabre and 
massaging that for a mod_perl app is stored in POE or PerlRPC, then it 
would be better to have the 5 middleware processes dealing with the 
shared memory stuff.

 

In addition, I would advocate middleware prior to talking to a mainframe 
because of security. You can have someone break into the web server but 
if it is hooked direct to the mainframe, then that person can hop 
directly onto the mainframe. Instead, the requests could be mediated and 
well-formed by the middleware. The cracker would have to hack the 
middleware after hacking the web server in order to get to the mainframe 
if you add a layer like this. Of course, maybe this is an Intranet 
application, so such things may not matter...
   


Security is always a concern, which is why Apache is a much better
solution.  It's much like buying security from a company which builds
public ATMs and buying security from companies which build corporate
laptop security systems.  The former is like Apache, the later is like
most (if not all) middleware.  Apache has proven the test time,
because it is being attacked *continuously*.  This is why Apache is so
much more secure than IIS, which had a much later start and wasn't
used for large sites.

 

Apache is better than IIS but I would not call it secure in an 
absolute sense. There have been plenty of exploits for Apache over the 
last year that give me headaches having to patch ASAP when discovered.

If you are a cracker and have hacked someone's Apache, but then your 
next crack has to find an exploit in a daemon written in Perl like POE 
before finally getting to the database or backend system, you are still 
slowing down your attacker. Usually at worst, the attacker will have to 
figure out something about how POE works.

I believe more script kiddies/casual crackers can probably log into 
sybase, oracle, mysql databases and trash them than they can figure out 
how to talk to an RMI engine, EJB server, SOAP, or POE middleware for an 
application layer prior to accessing the database.

Later,
  Gunther

 



Re: implementing a set of queue-processing servers

2002-11-18 Thread Rocco Caputo
On Fri, Nov 15, 2002 at 03:53:53PM -0500, Stephen Adkins wrote:
 At 02:09 PM 11/15/2002 -0500, Rocco Caputo wrote:
 On Fri, Nov 15, 2002 at 11:45:33AM -0500, Stephen Adkins wrote:
 
  QUESTIONS:
  
   * What queue mechanism would you use, assuming all of the
 writers and readers are on the same system?
 (IPC::Msg? MsgQ?)
 
 If speed is a major factor, I would use a FIFO (named pipe).  This is
 a very lightweight and fast way to pass data between processes on the
 same machine.
 
 Are FIFO's (named pipes) on Unix guaranteed to maintain the integrity
 of the messages in the case of multiple writers?
 I think you could guarantee this if you imposed restrictions on the
 data travelling through the pipe: i.e. single text line, must be
 written in a single (unbuffered) write() system call.
 Otherwise, doesn't a FIFO break down as a message queue when you have
 multiple writers with arbitrarily long message data?

According to _Advanced Programming in the UNIX Environment_, the
largest atomic write is PIPE_BUF bytes.  On FreeBSD,
/usr/include/limits.h defines PIPE_BUF as 512 bytes.

APUE also says:

  Indeed, the normal file I/O functions (close, read, write, unlink,
  etc.) all work with FIFOs.

This leads me to believe that flock() could protect the integrity of
large FIFO writes.  I've never had the occasion to need it and can't
say for sure.

-- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/



Re: implementing a set of queue-processing servers

2002-11-17 Thread Rob Nagler
Stephen Adkins writes:
 However, I have been thinking about asynchronous execution, queues, 
 and queue-working, and I wanted to get a handle on how best I should
 solve the problem in a general way.

I guess this is where we diverge.  It sounds like you have a
specific problem.  Generalizing at this stage is going to be a mistake
imiho.

Rob





Re: implementing a set of queue-processing servers

2002-11-16 Thread Stephen Adkins
At 02:09 PM 11/15/2002 -0500, Rocco Caputo wrote:
On Fri, Nov 15, 2002 at 11:45:33AM -0500, Stephen Adkins wrote:

 QUESTIONS:
 
  * What queue mechanism would you use, assuming all of the
writers and readers are on the same system?
(IPC::Msg? MsgQ?)

If speed is a major factor, I would use a FIFO (named pipe).  This is
a very lightweight and fast way to pass data between processes on the
same machine.

Are FIFO's (named pipes) on Unix guaranteed to maintain the integrity
of the messages in the case of multiple writers?
I think you could guarantee this if you imposed restrictions on the
data travelling through the pipe: i.e. single text line, must be
written in a single (unbuffered) write() system call.
Otherwise, doesn't a FIFO break down as a message queue when you have
multiple writers with arbitrarily long message data?

  * How about if the queue writers were distributed, but the
queue readers were all on one machine? (RPC to insert into
the above-mentioned local queues?)
  * How about if the queue writers and queue readers were all
distributed around the network? (Spread::Queue::FIFO?
Parallel::PVM? Parallel::MPI? MQSeries::Queue?)

Your requirement #2 seems to indicate that the queue is held in a
database table.  In that case the queue is inherently distributable.
Each machine makes its own connections to the database and processes
tasks in the queue using whatever locking is necessary.

Yes. In this case, you are right.
The only thing that's missing is the wakeup to the servers
so that they do not need to poll.

This requires queue workers to poll the database for new jobs, which
you later state is something you're trying to avoid.


 MY HUNCHES
 
 I think I'll use IPC::Msg as the queue because the queue readers
 will all be on one machine.  I'll also have to implement a simple RPC
 server (using Net::Server) to perform remote insertions into the 
 local queue.  If this seems too rough, I'll probably install the
 Spread Toolkit and use Spread::Queue.
 
 I currently think I'll keep working with Net::Server to see if I
 can use it to process a queue rather than listen on a network port,
 but I'm not sure that this is the right use of the module.
 I may end up ditching this effort and just have a set of parallel
 servers all waiting on the queue.  The queue mechanism itself will
 work out who gets to work on which request.
 
 Any input?

Depending on how critical your transactions are, it may be more
reliable to use the database as the queue.  Jobs passed through it are
saved to persistent storage, making them more likely to survive a
crash.  Do you need to roll forward unprocessed tasks if you must
restart the server?

Crash resistance is an important consideration for queues and queue
workers in general.  In this case, because it is primarily a read-only
decision support system, if we had a system crash, the loss of requests
in the queue would be the least of our worries.

If you use the database as the queue, the message passing between
clients and servers amounts to little more than a wake-up call: Hey,
you've got task!

You are right.

That is in fact all my queue needs to do is to say Hey, you've got a
task in order to eliminate polling when there is no work to do and to
wake up the server immediately when there is work to do.

I might almost use a signal.  I would just need to IGNORE the signal
while the server is running and reset the signal handler when the server
is about to go back to sleep.

However, I have been thinking about asynchronous execution, queues, 
and queue-working, and I wanted to get a handle on how best I should
solve the problem in a general way.

Stephen





Re: implementing a set of queue-processing servers

2002-11-16 Thread Stephen Adkins
At 02:09 PM 11/15/2002 -0500, Rocco Caputo wrote:
On Fri, Nov 15, 2002 at 11:45:33AM -0500, Stephen Adkins wrote:

 QUESTIONS:
 
  * What queue mechanism would you use, assuming all of the
writers and readers are on the same system?
(IPC::Msg? MsgQ?)

If speed is a major factor, I would use a FIFO (named pipe).  This is
a very lightweight and fast way to pass data between processes on the
same machine.

Are FIFO's (named pipes) on Unix guaranteed to maintain the integrity
of the messages in the case of multiple writers?
I think you could guarantee this if you imposed restrictions on the
data travelling through the pipe: i.e. single text line, must be
written in a single (unbuffered) write() system call.
Otherwise, doesn't a FIFO break down as a message queue when you have
multiple writers with arbitrarily long message data?

  * How about if the queue writers were distributed, but the
queue readers were all on one machine? (RPC to insert into
the above-mentioned local queues?)
  * How about if the queue writers and queue readers were all
distributed around the network? (Spread::Queue::FIFO?
Parallel::PVM? Parallel::MPI? MQSeries::Queue?)

Your requirement #2 seems to indicate that the queue is held in a
database table.  In that case the queue is inherently distributable.
Each machine makes its own connections to the database and processes
tasks in the queue using whatever locking is necessary.

Yes. In this case, you are right.
The only thing that's missing is the wakeup to the servers
so that they do not need to poll.

This requires queue workers to poll the database for new jobs, which
you later state is something you're trying to avoid.


 MY HUNCHES
 
 I think I'll use IPC::Msg as the queue because the queue readers
 will all be on one machine.  I'll also have to implement a simple RPC
 server (using Net::Server) to perform remote insertions into the 
 local queue.  If this seems too rough, I'll probably install the
 Spread Toolkit and use Spread::Queue.
 
 I currently think I'll keep working with Net::Server to see if I
 can use it to process a queue rather than listen on a network port,
 but I'm not sure that this is the right use of the module.
 I may end up ditching this effort and just have a set of parallel
 servers all waiting on the queue.  The queue mechanism itself will
 work out who gets to work on which request.
 
 Any input?

Depending on how critical your transactions are, it may be more
reliable to use the database as the queue.  Jobs passed through it are
saved to persistent storage, making them more likely to survive a
crash.  Do you need to roll forward unprocessed tasks if you must
restart the server?

Crash resistance is an important consideration for queues and queue
workers in general.  In this case, because it is primarily a read-only
decision support system, if we had a system crash, the loss of requests
in the queue would be the least of our worries.

If you use the database as the queue, the message passing between
clients and servers amounts to little more than a wake-up call: Hey,
you've got task!

You are right.

That is in fact all my queue needs to do is to say Hey, you've got a
task in order to eliminate polling when there is no work to do and to
wake up the server immediately when there is work to do.

I might almost use a signal.  I would just need to IGNORE the signal
while the server is running and reset the signal handler when the server
is about to go back to sleep.

However, I have been thinking about asynchronous execution, queues, 
and queue-working, and I wanted to get a handle on how best I should
solve the problem in a general way.

Stephen





Re: implementing a set of queue-processing servers

2002-11-16 Thread Stephen Adkins
At 08:59 PM 11/15/2002 +0100, Valerio_Valdez Paolini wrote:

On Fri, 15 Nov 2002, Stephen Adkins wrote:

 You seem to advocate Apache/mod_perl for end-user (returning HTML)
 and server-to-server (RPC) use.  That makes sense.
 But it doesn't seem to make sense for my family of servers that
 spend all of their time waiting for the mainframe to return their
 next transaction.

Are you also interested in fault tolerance and accuracy of computation?
And what about caching?

In my case, we have enough single points of failure in the system
that fault tolerance for this component is not critical.

However, I am interested in general in the solution to the 
[asynchronous execution + queue + queue worker] problem along
with all of the issues related to scalability and reliability.

accuracy of computation? of course. but this would seem to me to be
a matter of program logic.

caching? not sure what you mean, but caching is good if it increases
performance without creating synchronization problems.

Stephen





implementing a set of queue-processing servers

2002-11-15 Thread Stephen Adkins
Hi,

I have the following requirement, and I am seeking your input.

 1. web-based users make requests for data which are put
in a queue
 2. these requests and their status need to be in a database
so that users can watch the status of the queue and their
requests in the queue
 3. a set of servers process the requests for data from the
queue and put the results in a results table so that
users can view their data when their requests are done

QUESTIONS:

 * What queue mechanism would you use, assuming all of the
   writers and readers are on the same system?
   (IPC::Msg? MsgQ?)
 * How about if the queue writers were distributed, but the
   queue readers were all on one machine? (RPC to insert into
   the above-mentioned local queues?)
 * How about if the queue writers and queue readers were all
   distributed around the network? (Spread::Queue::FIFO?
   Parallel::PVM? Parallel::MPI? MQSeries::Queue?)

 * What Perl server-building software on CPAN do you recommend,
   and why? (Net::Server? Net::Daemon? POE?)
   I started working with Net::Server, but it seems focused
   on being a network request multi-server, not a queue working
   multi-server.
 * Would you implement it as many peer-level servers waiting on
   a single queue? or a single parent server waiting on the queue,
   dispatching queued work units to waiting child servers?

QUICK AND DIRTY SINGLE-SERVER SOLUTION

I implemented a quick-and-dirty single-server solution, where
I use a single server to process requests.  I simply poll the
request table in the database once a minute for new requests,
and if they exist, I process them.

Now I am looking to upgrade this for higher throughput (multiple
parallel servers), lower background load (no polling during quiet
periods), and lower latency (immediate response to queue insertion
rather than waiting for the next poll interval).

MY HUNCHES

I think I'll use IPC::Msg as the queue because the queue readers
will all be on one machine.  I'll also have to implement a simple RPC
server (using Net::Server) to perform remote insertions into the 
local queue.  If this seems too rough, I'll probably install the
Spread Toolkit and use Spread::Queue.

I currently think I'll keep working with Net::Server to see if I
can use it to process a queue rather than listen on a network port,
but I'm not sure that this is the right use of the module.
I may end up ditching this effort and just have a set of parallel
servers all waiting on the queue.  The queue mechanism itself will
work out who gets to work on which request.

Any input?

Stephen





Re: implementing a set of queue-processing servers

2002-11-15 Thread Rob Nagler
Stephen Adkins writes:
 QUICK AND DIRTY SINGLE-SERVER SOLUTION
 
 I implemented a quick-and-dirty single-server solution, where
 I use a single server to process requests.  I simply poll the
 request table in the database once a minute for new requests,
 and if they exist, I process them.
 
 Now I am looking to upgrade this for higher throughput (multiple
 parallel servers), lower background load (no polling during quiet
 periods), and lower latency (immediate response to queue insertion
 rather than waiting for the next poll interval).

I like this solution.  Are you finding performance problems?  One
thing is to execute process_queue right after doing the insert.
Remember that databases are great at assuring atomicity and
persistency.

Apache/mod_perl is the best all around application server available.
What's simpler than LWP::Request with an URL and a return of a Perl
(or XML if it's another language) data structure?  Run eval
(or XML::Parser) and off you go.  You can wrap this, but how many
message types do you have?

Rob





Re: implementing a set of queue-processing servers

2002-11-15 Thread Rob Nagler
Stephen Adkins writes:
 The server(s) connect to a mainframe and perform time-consuming,
 repetitive transactions to collect the data that has been requested.
 Thus, these servers are slow, waiting several seconds for each
 response, but they do not put a large load on the local processor.
 So I want many of them running in parallel.

Makes sense.

 Are you proposing that I use Apache/mod_perl child processes to do
 the transactions to the mainframe?  That doesn't seem right.
 They are then no available to listen for HTTP requests, which is 
 the whole purpose of an apache child process.

That's the point.  By using the Apache/mod_perl processes for all
work, you can easily designed for peak load.  It's all work.  You can
serve HTTP requests if your machine is overloaded doing work of
other sorts.

We do this for e-commerce, web scraping, mail handling, etc.
Everything goes through Apache/mod_perl.  No fuss, no muss.

 You seem to advocate Apache/mod_perl for end-user (returning HTML)
 and server-to-server (RPC) use.  That makes sense.
 But it doesn't seem to make sense for my family of servers that
 spend all of their time waiting for the mainframe to return their
 next transaction.

Can you do asynchronous I/O?  You'll be a lot more efficient memory
and CPU-wise if you send a series of messages and wait for the results
to come in.  Consuming a Unix/Mainframe process slot (or even a
thread) for something like this is very inefficient.

I worked on a CORBA-based Web server for Tandem, which didn't use
threads.  Instead the servers would do asynchronous I/O to the
resources they were responsible for.  I built the CGI component, which
on Tandem was a gateway to Tandem's transaction monitor, Pathway.
All CGI processes were managed by a single process which accepted
requests via CORBA and fired off messages to Pathway.  When Pathway
would respond, the CORBA response would be sent.  Replace CORBA with
HTTP, and you have a simpler, more efficient solution.

One other trick you might try is simply hanging onto the HTTP request
until all the jobs for a particular user finish.  If you have, say 50
jobs, and they run in parallel, they might get done before 30 seconds
which is short enough for a person to way and that way you don't deal
with the whole database/polling/garbage collection piece.

Rob