Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-27 Thread Rob Nagler
Perrin Harkins writes:
 Bas A.Schulte wrote:
   I do when the delivery mechanism has failed for 6 hours and I have 12000
   messages in the queue *and* make sure current messages get sent in time?
 
 I don't know, that's an application-specific choice.  Of course JMS
 doesn't know either.

This is one of the endemic problems with J2EE.  It doesn't know, and
it has to offer you lots of options to allow you to control the
horizontal and vertical.  Since it is a distributed platform, it can't
export hooks (callbacks), which allow you to decide on the fly.  The
options get out of control, and make it look like the system is
fancier than it really is.  Rather, when you see an option, it usually
means the developers couldn't agree on what to do (paraphrased from
Joel Spolsky, http://www.joelonsoftware.com/).

With bOP, we tend to make policy decisions like this centrally, e.g.,
no exactly-once semantics.  There's a real cost, but then we've used
bOP for a wide variety of batch and Web applications without much
strain so we keep doing it this way.  When we stress the system too
much, we add a decision point (option) for the programmer.  However,
we only do this after careful deliberation.  This is one of the
reasons we don't release bOP in parts as some have suggested.  You
can use it in layers, but every application we've built ends up using
all the layers.

J2EE has too many competing/conflicting components, and each of those
components can be configured in myriad ways.  Only experienced
building distributed systems builders can know the trade-offs.  J2EE
is sold as an everyman's platform for everybody's problem.  This means
people often get caught using the wrong tool (entity beans) the wrong
way (a bean per DB row).  There's no easy answer to the problem of
distributed systems (esp. one as complex as SMS message queueing), and
J2EE gives one the impression there is, all in imiho, of course. :-)

BTW, the issue of exactly-once vs at-most-once is a tough one (and was
subject to much debate in the 80s).  JMS tries to guarantee
exactly-once, but that's really hard to do.  Especially in an SMS
situation where network partitioning is a real problem.  My
alphanumeric pager service holds messages for 3 days, and that's a
long time imo.  They can only do this, because pagers aren't
bi-directional (for the most part).  Once you get into SMS space,
where devices are bi-directional and much more useful, you have a real
problem promising exactly-once semantics.

Rob





Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Bas A . Schulte
Hi all,

On Tuesday, November 19, 2002, at 11:09 PM, Perrin Harkins wrote:


Stephen Adkins wrote:


So what I think you are saying for option 2 is:

   * Apache children (web server processes with mod_perl) have two
 personalities:
   - user request processors
   - back-end work processors
   * When a user submits work to the queue, the child is acting in a
 user request role and it returns the response quickly.
   * After detaching from the user, however, it checks to see if fewer
 than four children are processing the queue and if so, it logs 
into
 the mainframe and starts processing the queue.
   * When it finishes the request, it continues to work the queue until
 no more work is available, at which time, it quits its back-end
 processor personality and returns to wait for another HTTP 
request.

This just seems a bit odd (and unnecessarily complex).


It does when you put it like that, but it doesn't have to be that way.


I've implemented the exact thing Perrin describes in our SMS game 
platform (read a bit about it here: 
http://perl.apache.org/outstanding/success_stories/sms_server.html).

When synchronous requests come in that trigger some event that has to 
take place in the future *and* that runs in the same Apache server 
instance, I have an external (simple) daemon that reads timer events 
from a shared database table and posts HTTP requests to the Apache 
server instance.

The reason I did it like this is that I can easily (not to mention 
quickly) run perl code in Apache *and* it is quite a stable server, much 
more stable than something I could whip out in perl. I did try some perl 
preforking server code (from Lincoln D. Stein's book and 
Net::Server::PreFork as well as some self-programmed stuff) but none of 
them seemed to be stable/fast under heavy load even though I would have 
preferred that as it would allow me to do something to handle 
data-sharing between children via the parent which always seems to be in 
issue in Apache/mod_perl.

The only thing that now and then is problematic is that Apache child 
processes in which my perl code runs are not easily coordinated (at 
least I still haven't found a good way). So this situation (from 
Stephen's mail):

We have a fixed number of mainframe login id's, so we can only run a
limited number (say 4) of them at a time.


still is something I haven't figured out. Basically, I need some way to 
coordinate the children so each child can find out what the other 
children are doing.

BTW: I've been reading up a lot on J2EE lately and it appears more and 
more that a J2EE app server could quite nicely provide for my needs 
(despite all shortcomings and issues of course). Now if there only was a 
P5EE app server ;)

Regards,

Bas.




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Rob Nagler
Bas A.Schulte writes:
 still is something I haven't figured out. Basically, I need some way to 
 coordinate the children so each child can find out what the other 
 children are doing.

Use a table in your database.  The DB needs to support row level
locking (we use Oracle).   Here's an example:

insert into resource_lock_t (instance_count) values (1)

Don't commit yet.  Rather, right before committing, delete the row:

delete from resource_lock_t where instance_count = 1

Anybody waiting for instance_count #1 will block until the delete
happens.  You only allow up to four inserts (instance_count is the
primary key).

Rob





Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Perrin Harkins
Bas A.Schulte wrote:

none of
them seemed to be stable/fast under heavy load even though I would have 
preferred that as it would allow me to do something to handle 
data-sharing between children via the parent which always seems to be in 
issue in Apache/mod_perl.

What are you trying to share?  In addition to Rob's suggestion of using 
a database table (usually the best for important data or clustered 
machines) there are other approaches like IPC::MM and MLDBM::Sync.

Basically, I need some way to 
coordinate the children so each child can find out what the other 
children are doing.

Either of the approaches I just mentioned would be fine for this.


BTW: I've been reading up a lot on J2EE lately and it appears more and 
more that a J2EE app server could quite nicely provide for my needs 
(despite all shortcomings and issues of course).

What is it that you think you'd be getting that you don't have now?

- Perrin




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Perrin Harkins
Nigel Hamilton wrote:

	I need to fork a lot of processes per request ... the memory cost 
of forking an apache child is too high though.
	
	So I've written my own mini webserver in Perl

It doesn't seem like this would help much.  The thing that makes 
mod_perl processes big is Perl.  If you run the same code in both they 
should have a similar size.

- Perrin



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Bas A . Schulte
Hi Perrin,

On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote:


Bas A.Schulte wrote:

none of
them seemed to be stable/fast under heavy load even though I would 
have preferred that as it would allow me to do something to handle 
data-sharing between children via the parent which always seems to be 
in issue in Apache/mod_perl.

What are you trying to share?  In addition to Rob's suggestion of using 
a database table (usually the best for important data or clustered 
machines) there are other approaches like IPC::MM and MLDBM::Sync.

I don't want to use a database table for the sole purpose of sharing 
data, I mean, I run the Apache/mod_perl servers to handle different 
components of our system, some run on top of a database and some of them 
don't.

Also, the things I would want to share are fairly dynamic things so a 
roundtrip to a database would probably add quite some overhead.

I have been looking at some of the IPC::Share* modules, the one I think 
I can use is (not sure here) IPC::ShareLite, but that darned thing won't 
install on my dev. machine (iBook/OS X) so I've been postponing things a 
bit ;)

My current plan is IPC::MM, stay tuned.


As to *what* I'm trying to share: I don't really know yet ;) Dynamic 
stuff like:

- what is a given child doing (to do things like: ok, I'm currently 
pushing data to some client in 5 children, and I don't want to have 
another child do this now so stuff this task in a queue somehere so I 
can process it later);
- application state. This is domain-specific so it's a bit hard to 
explain what I mean. I need serialized and *fast* access to this info so 
I would prefer not having this in my database.

NB: I posted a question on the first issue (look for IPC suggestions 
sought/talking between children? somewhere in the mod_perl mailinglist, 
I never seem to recall the proper archive site for it), didn't get any 
feedback on it as it probably goes beyond what someone would normally 
want from a web server.


BTW: I've been reading up a lot on J2EE lately and it appears more and 
more that a J2EE app server could quite nicely provide for my needs 
(despite all shortcomings and issues of course).

What is it that you think you'd be getting that you don't have now?


Again; I don't know exactly but when I read stuff about entity-, 
session- and message beans, JMS etc., it has a lot of resemblance with 
what I'm currently doing by hand i.e. implement functionality like 
that on top of a bare Apache/mod_perl server.

A good example would be JMS: you get this for free (with JBoss 
anyway ;)) in a J2EE app. server but there's no obvious choice for us 
perl guys. There are some options I see now and then: Spread/Stem/POE, 
but none of these choices are obvious in the sense that they are being 
used by a lot of people to solve the type of problems JMS solves so 
there's really no one to turn to for advise; again, I'm building stuff 
between the raw metal and my own stuff.

BTW: with the issue on data-sharing: the same thing: I have raw metal 
(Apache/mod_perl and IPC:MM) and need to implement an API on top of them 
before I have the needed functionality. Again I'm building stuff again 
before I can solve my actual business problems.

I think these issues point out that we are missing *something*, I know 
*I* am :)

Regards,

Bas.




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Stephen Adkins
At 07:04 PM 11/26/2002 +0100, Bas A.Schulte wrote:
On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote:
 Bas A.Schulte wrote:

I have been looking at some of the IPC::Share* modules, the one I think 
I can use is (not sure here) IPC::ShareLite, but that darned thing won't 
install on my dev. machine (iBook/OS X) so I've been postponing things a 
bit ;)

My current plan is IPC::MM, stay tuned.

Hi,

Take a look at

   http://www.officevision.com/pub/p5ee/components.html#shared_storage

There are references to every major shared storage method I have seen
discussed on the mod_perl list or elsewhere.

There are also some interesting links to mod_perl list discussions
on performance comparison and synchronization using the various
tools.

Stephen





Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Nigel Hamilton
 
 Quite odd. I read the performance thread that's on the P5EE page which 
 showed that DBI (with MySQL underneath) was very fast, came in 2nd. 
 Anyone care to elaborate why this is? After all, shared-memory is a 
 thing in RAM, why isn't that faster?
 

Hi Bas,

You made some really interesting points in your last email ... and 
I hope it sparks a full discussion. 

Just a quick point on the MySQL observation above ... MySQL
Memory-Hash Tables may be even quicker, again - as the disk is not
involved. Your messages could be inserted into a buffer table with a
microsecond timestamp and then a separate process(es) pops messages off
the queue. This hands the memory consumption problem to MySQL and provides
multiple ways of talking to the queue (cronjobs, apache kids etc).

At Turbo10, our click-through system choked under heavy load until
we implemented it as a memory buffer (MySQL hash table) ... just a 
thought.


Nigel


-- 
Nigel Hamilton
Turbo10 Metasearch Engine

email:  [EMAIL PROTECTED]
tel:+44 (0) 207 987 5460
fax:+44 (0) 207 987 5468

http://turbo10.com  Search Deeper. Browse Faster.




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Rob Nagler
Perrin Harkins writes:
 I think you are vastly over-estimating how much effort JMS/EJB/etc. 
 would save you.

EJB doesn't save you anything.  It creates work and complexity,
esp. Entity Beans.  I've built large systems using EJB and Perl.  The
Perl project was built faster, with fewer people, runs more reliably,
runs faster, and the Perl company is still in business, which is the
only point that really counts. :-)

JMS does solve an interesting problem, but don't use Message Beans,
use raw JMS.  Make sure JMS isn't looking for a solution, though.
Often times, the solution is better and more robustly solved by
implementing pending replies from the server.  This avoids a number
of resource management issues, which can really bog a server.

Rob






Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-26 Thread Perrin Harkins
Bas A.Schulte wrote:
 Quite odd. I read the performance thread that's on the P5EE page which
 showed that DBI (with MySQL underneath) was very fast, came in 2nd.
 Anyone care to elaborate why this is? After all, shared-memory is a
 thing in RAM, why isn't that faster?

I have an article that I'm working which explains all of this, but the 
short explanation is that they work by serialzing the entire memory 
structure with Storable and stuffing it into a shared memory segment, 
and even reading it requires loading and de-serializing the whole thing. 
  IPC::MM and the file-based ones are much more granular.  Also, file 
systems are very fast on modern OSes because of efficient VM systems 
that buffer files in memory.

 I'm not saying I want entity beans here ;) It's just that I've been
 doing perl to pay for bills and stuff the past few years and see a lot
 of people having some (possibly perceived?) need for something missing
 in perl.

It may be that they just want someone to tell them how they should do
things.  J2EE does provide that to a certain degree.

 If I read your mail, you mention some solutions/directions for some
 problems I'm dealing with, but that's just my issue (I think; it's just
 coming to me): we have a lot of raw metal but we do have to do a lot
 of welding and fitting before we can solve our business problems.
 
 That is basically the point.

I don't think it's nearly that bad.  After my eToys article got 
published, I got several e-mails from people saying something like we 
want to do this, but our boss says we have to buy something because of 
all the INFRASTRUCTURE code we would have to write.

Infrastructure?  What infrastructure?  The only stuff we wrote that was 
really independent of our application logic were things like a logging 
class and a singleton class, which can now be had on CPAN.  We wrote our 
own cache system, but that's because it worked in a very specific way 
that the available tools didn't handle.  I think I could do that with 
CPAN stuff now too.

 To illustrate that, I'll try to give a real-world example

Thanks, it's much easier to talk about specific situations.

 To deliver these messages, I send them off to another server (using my
 own invented pseudo-RMI to call a method on that server).

I would use HTTP for that, because I'm too lazy to write the RMI code 
myself.

 1. The server that does the delivery has plenty of threads (er, a
 Apache/mod_perl child) so I hope I have enough of them to deliver the
 messages at the rate the backend server generates them: one child might
 take up to 5 seconds to deliver the message but there are plenty childs.

 Not good. I've seen how this works and miserably fails when a delivery
 mechanism barfs.

If they were so quick to process that you could do it that way, I would
have just handled them in the original mod_perl server with a
cleanup_handler.  Obviously they are not, so that's not an option here.

 2. Same as 1 but I never allow one delivery mechanism to use all my
 Apache/mod_perl children by adding some form of IPC (darned, need to
 solve my data sharing issues first!)

I think they are already solved if you look at the modules I suggested.

 so the children check what the
 others are currently doing: if a request comes in for a particular
 delivery mechanism, I check if we're already doing N delivery attempts
 and drop the request somewhere (database/file, whatever) if not. I have
 a daemon running that monitors that queue.

I would structure it like this:
- Original server takes request, and writes it to a database table that
holds the queue.
- A cron job checks the queue for messages, reads the status from
MLDBM::Sync to see if we have free processes, and passes the request to
mod_perl if we do.  (Not that this could also be done with something
like PersistentPerl instead.)  If there are no free processes, they are
left on the queue.

 That daemon gets complicated quickly as it also has to throttle delivery
 attempts

My approach only puts that logic in the cron job.

 I need some form of persistent storage (with locking)

The relational database.  Or MLDBM::Sync if you prefer.

 what do
 I do when the delivery mechanism has failed for 6 hours and I have 12000
 messages in the queue *and* make sure current messages get sent in time?

I don't know, that's an application-specific choice.  Of course JMS
doesn't know either.

 3. I install qmail on the various servers, and use that to push messages
 around. This'll take me a week or so (hopefully) to get it running
 reliably in production

One of the major selling points for qmail is easier setup.  You could
use pretty much any mail server though if you have more experience with
something else.  I just like qmail because it's fast.

 Later on, I
 realise that for each messages, a fullblown process is forked *per
 message*: load up perl, compile perl code etc..

I described how to avoid this in another message: use PersistentPerl or
equivalent, or pass 

Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)

2002-11-22 Thread Matt Sergeant
On Friday, Nov 22, 2002, at 02:49 Europe/London, Gunther Birznieks 
wrote:

I disagree. I think it depends on the protocol. A well designed 
protocol for an application will spread and stand the test of time. 
Sometimes the protocol doesn't have to be well designed, but just that 
it's standard can help tremendously.

eg if we were a world that said HTTP is it and we should do 
everything over HTTP, then would you really see SMTP over HTTP? SNMP 
over HTTP? telnet over HTTP? Why?

This doesn't really make sense to me.

[OT, because I know this isn't really your point]

As someone who's entire job revolves around SMTP these days, I'd love 
to see mail go over HTTP. SMTP's got no concept of negotiation. It's 
got little in the way of versioning (HELO vs EHLO). It's got no 
permanent redirect (e.g. [EMAIL PROTECTED] is now 
[EMAIL PROTECTED]). It's got very weak handling of binary 
data. Writing mail server plugins is very non-standardised.

Don't get me wrong, SMTP is a great protocol, but HTTP is sometimes 
just *so* much nicer :-)

Matt.



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-20 Thread Perrin Harkins
Aaron Johnson wrote:


This model has eased my testing as well since I can run the script
completely external of the web server I can run it through a debugger if
needed.



You realize that you can run mod_perl in the debugger too, right?  I use 
the profiler and debugger with mod_perl frequently.

- Perrin



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-20 Thread Perrin Harkins
Aaron Johnson wrote:


I know you _can_ , but I don't find it convenient.



For me it's pretty much the same as debugging a command-line script.  To 
debug a mod_perl handler I just do something like this:

httpd -X -Ddebug

Then I hit the URL with a browser or with GET and it pops me into the 
debugger.  I have httpd.conf set up to add the PerlFixupHandler 
+Apache::DB line when it sees the debug flag.

I still don't like to give apache long processes to manage, I feel this
can be better handled external of the server and in my case it allows
for automation/reports on non-mod_perl machines.



I try to code it so that the business logic is not dependent on a 
certain runtime environment, and then write a small mod_perl handler to 
call it.  Then I can use the same modules in cron jobs and such.  It can 
get tricky in certain situations though, when you want to optimize 
something for a long-running environment but don't want to break it for 
one-shot scripts.

- Perrin



Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-20 Thread Rob Nagler
Perrin Harkins writes:
 I try to code it so that the business logic is not dependent on a 
 certain runtime environment, and then write a small mod_perl handler to 
 call it.

I've been doing a lot of test-first coding.  It makes it so that you
start Apache, and the software just runs.  With sufficient granularity
of unit tests, we find that we don't use the debugger.  Run the test,
and it tells you what's wrong.

Rob





Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)

2002-11-20 Thread Rob Nagler
Gunther Birznieks writes:
  In the context of what you are saying, it seems as if everyone should 
 just stick to using TCP/IP/Telnet as a protocol and then the world would 
 be a better place.

Once upon a time, there was OSI, SNA, DECnet, etc.  Nowadays, all
computers talk IP, even if you connect from AOL.  Yes, the other
protocols are still around, but nobody in their right mind would
recommend them anymore.

 But I don't think this is so. Everyone ends up creating their own 
 protocols, their own algorithms on top of TCP on how to communicate.

Because it's FUN, and you probably can get a Ph.D. thesis out of it. ;-)

 In 
 a way it is simpler because you just have the freedom to create whatever 
 you want. But in another way, it is a nightmare because everyone will 
 just implement their own way of doing things. This can be OK in some 
 contexts, but I find it difficult to believe that this is the best thing 
 overall.

I'm not advocating this.  Rather, I am recommending using a
well-known, and arguably the most widely-used protocol:
application/x-www-form-urlencoded--and it's near cousin
multipart/form-data.  However, that's messy, we can just call it HTTP,
and our implementation is LWP and Apache.

 At least with J2EE, for every major standard or protocol
 implemented, there is only one way to do it.  With Perl, you
 actually have more confusion because there are many more ways to do
 it. More ways to do templating, more ways to do middleware, more
 ways to do serialization of objects, etc...

There are equivalent number of ways in both languages.  If you are
saying that you could build a standard component in, say, EJB, and
sell it, well, that's just not the case.  That's the pipe dream of
CORBA.  The only thing close to portable protocols is HTTP.  Sabre,
for example, gives you a library, and you have to interface to it.
However, authorize.net's interface is HTTP, and I can write my own
library in  100 lines of Perl, which matches my application, and
doesn't require me to install anything.

There's such a thing as standard protocols, but every application uses
them differently.

Rob






asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Stephen Adkins
At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote:

We digress.  The problem is to build a UI to Sabre.  I still haven't
seen any numbers which demonstrate the simple solution doesn't work.
Connecting to Sabre is no different than connecting to an e-commerce
gateway.  Both can be done by connecting directly from the Apache
child to the remote service and returning a result.

Hi,

My question with this approach is not whether it works for synchronous
execution (the user is willing to wait for the results to come back)
but whether it makes sense for asynchronous execution (the user will
come back and get the results later).

In fact, we provide our users with the option:

   1. fetch the data now and display it, OR
   2. put the request in a queue to be fetched and then later displayed

We have a fixed number of mainframe login id's, so we can only run a
limited number (say 4) of them at a time.

So what I think you are saying for option 2 is:

   * Apache children (web server processes with mod_perl) have two
 personalities:
   - user request processors
   - back-end work processors
   * When a user submits work to the queue, the child is acting in a
 user request role and it returns the response quickly.
   * After detaching from the user, however, it checks to see if fewer
 than four children are processing the queue and if so, it logs into
 the mainframe and starts processing the queue.
   * When it finishes the request, it continues to work the queue until
 no more work is available, at which time, it quits its back-end
 processor personality and returns to wait for another HTTP request.

This just seems a bit odd (and unnecessarily complex).
Why not let there be web server processes and queue worker processes
and they each do their own job?  Web servers seem to me to be for
synchronous activity, where the user is waiting for the results.

Stephen

P.S. Another limitation of the use Apache servers for all server processing
philosophy seems to be scheduled events or system events (those not
initiated by an HTTP request, which are user events).

example: Our system allows users to set up a schedule of requests to be run.
i.e. Every Tuesday at 3:00am, put this request into the queue.
This is a scheduled event rather than a user event.
How is a web server process going to wake up and begin processing this?
(unless of course everyone who puts something into the queue must send
a dummy HTTP request to wake up the web servers)






Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Perrin Harkins
Stephen Adkins wrote:


So what I think you are saying for option 2 is:

   * Apache children (web server processes with mod_perl) have two
 personalities:
   - user request processors
   - back-end work processors
   * When a user submits work to the queue, the child is acting in a
 user request role and it returns the response quickly.
   * After detaching from the user, however, it checks to see if fewer
 than four children are processing the queue and if so, it logs into
 the mainframe and starts processing the queue.
   * When it finishes the request, it continues to work the queue until
 no more work is available, at which time, it quits its back-end
 processor personality and returns to wait for another HTTP request.





This just seems a bit odd (and unnecessarily complex).



It does when you put it like that, but it doesn't have to be that way. 
I would separate the input (user or queue) from the processing part. 
You'd have a module that runs in mod_perl which knows how to process 
requests.  You have a separate module which can provide a UI for placing 
 requests.  Synchronous ones go straight to processing, while asynch 
ones get added to the queue.

You'd also have a controlling process that polls the queue and if it 
finds anything it uses LWP to send it to mod_perl for handling.  I would 
make this a tiny script triggered from cron if possible, since cron is 
robust and can handle outages and error reporting nicely.

Why not let there be web server processes and queue worker processes
and they each do their own job?  Web servers seem to me to be for
synchronous activity, where the user is waiting for the results.



When I think of queue processing, I think of a system for handling tasks 
in parallel that provides a simple API for plugging in logic, a 
well-defined control interface, logging, easy configuration... sounds 
like Apache to me.  You just need a tiny control process to trigger it 
via LWP.  Apache is already a system for handling a queue of HTTP 
requests in parallel, so you just have to make your requests look like HTTP.

You certainly could do this other ways, but you'd probably have to write 
a lot more code or else use something far less reliable than Apache.

P.S. Another limitation of the use Apache servers for all server 
processing
philosophy seems to be scheduled events or system events (those not
initiated by an HTTP request, which are user events).


Cron/at + LWP.

- Perrin




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Valerio_Valdez Paolini

Hi Stephen,

On Tue, 19 Nov 2002, Stephen Adkins wrote:

 My question with this approach is not whether it works for synchronous
 execution (the user is willing to wait for the results to come back)
 but whether it makes sense for asynchronous execution (the user will
 come back and get the results later).

What kind of interface will you provide to the final users?

 In fact, we provide our users with the option:

1. fetch the data now and display it, OR
2. put the request in a queue to be fetched and then later displayed

 We have a fixed number of mainframe login id's, so we can only run a
 limited number (say 4) of them at a time.

So it is possible that an immediate request is queued if the system has
already reached its maximum allowed logins. In other words, a final user
can request to display data immediately but your middleware can answer that
the request has been queued, possibly saying 'your job id, position n, see
you later'.

Moreover, you must preserve order of requests. And if I recall correctly you
talked about a sort of queue listing and some job manipulation. Whatever
will be your choice, you undoubtedly need to serialize requests and enqueue
them using a dbms. It is the simplest approach.

Given a method to add, list or remove requests from this kind of queue,
mod_perl (even plain cgi scripts) can use these method to manipulate a
user's job. Using access control supplied by Apache, it is possible to give
different access rights to users of the middleware.

Requests from final users will be always enqueued by an Apache children,
that will get a job-id and its position in the queue. If the job is on top
of the queue, you will immediately wait for its completion. Otherwise you
can tell the user to check his job queue later.
Users can remove jobs from the queue.

All completed jobs will be stored somewhere (file system and/or db) and can
be listed by legitimate users. Jobs completed will show in a separate queue.

An external entity will dequeue jobs and process them, probably using
something like Parallel::ForkManager to limit concurrent requests.

Another entity will enqueue recurring jobs. Jobs scheduled for future
processing should always be enqueued immediately, or I can't imagine
a coherent interface to remove jobs.

These entities look like daemons, that can be spawned and controlled using
code executed by Apache.

Please note that I never mentioned html, using Apache as your
infrastructure you can build whatever interface you need.

Requests recorded inside db can also be used to implement a cache,
probably reused by following requests.
It would be possible to collapse identical requests (to save logins).

Obviously it is possible to replace Apache with POE or Stem, but I don't
know how, sorry. There are many other solutions, but this sketch describes
my way to do it. Sorry for the length of this message.

Ciao, Valerio


 Valerio Paolini, http://130.136.3.200/~paolini
--
 Linux, the Cheap Chic for Computer Fashionistas




Re: asynchronous execution, was Re: implementing a set of queue-processing servers

2002-11-19 Thread Aaron Johnson
On Tue, 2002-11-19 at 16:28, Stephen Adkins wrote:
 At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote:
 
 We digress.  The problem is to build a UI to Sabre.  I still haven't
 seen any numbers which demonstrate the simple solution doesn't work.
 Connecting to Sabre is no different than connecting to an e-commerce
 gateway.  Both can be done by connecting directly from the Apache
 child to the remote service and returning a result.
 
 Hi,
 
 My question with this approach is not whether it works for synchronous
 execution (the user is willing to wait for the results to come back)
 but whether it makes sense for asynchronous execution (the user will
 come back and get the results later).
 
 In fact, we provide our users with the option:
 
1. fetch the data now and display it, OR
2. put the request in a queue to be fetched and then later displayed
 
 We have a fixed number of mainframe login id's, so we can only run a
 limited number (say 4) of them at a time.
 
 So what I think you are saying for option 2 is:
 
* Apache children (web server processes with mod_perl) have two
  personalities:
- user request processors
- back-end work processors
* When a user submits work to the queue, the child is acting in a
  user request role and it returns the response quickly.
* After detaching from the user, however, it checks to see if fewer
  than four children are processing the queue and if so, it logs into
  the mainframe and starts processing the queue.
* When it finishes the request, it continues to work the queue until
  no more work is available, at which time, it quits its back-end
  processor personality and returns to wait for another HTTP request.
 
 This just seems a bit odd (and unnecessarily complex).
 Why not let there be web server processes and queue worker processes
 and they each do their own job?  Web servers seem to me to be for
 synchronous activity, where the user is waiting for the results.
 

I am doing something similar right now in a project.  It has to make
approx. 220 requests to outside sources in order to compile a completed
report. These reports vary in time to create based on the data sources
and network traffic.  This is the solution I have in place currently:

1) User visits web page (handled by mod_perl) and they make the request
for a report.

2) The request parameters are stored into a temp file and the user is
redirected to a wait page.  The time spent of the wait page varies and
an approx time is created based on query complexity. The user session is
given a key that is that matches the temp file name.

3) A separate dedicated server (Proc::Daemon based) picks up the temp
file and spawns a child to process it.  This daemon looks for new temp
files every X seconds, were X is 15 seconds, but it could easily be
adjusted.  It keeps a queue of the temp files that have been processed
and drops them from the queue after 45 minutes even if they haven't run.

4) The child recreates the users object and runs the report, when it
completes it deletes the temp file.  If it fails to complete the temp
file remains.

5) When the auto refresh takes place the system determines if the users
request has completed by looking for the temp file named in their
session data.  If the file exists they are given another wait page with
a 30 to 120 second wait time.  If it doesn't exist then the cached
information from the report, just an XML file created from a XML::Simple
dump of the hash containing the report data, is processed and presented
as HTML to the user.

I had attempted using a mod_perl only solution, but I didn't like tying
up the server with additional processing that could be handled
externally.  This method also allows for the server script to reside on
a separate machine (allowing for some shared filesystem samba, NFS etc)
without having to recreate an entire mod_perl environment.

This model has eased my testing as well since I can run the script
completely external of the web server I can run it through a debugger if
needed.  I also use the same script for nightly automated common reports
to limit the number of real time requests since the data doesn't can
that frequently in my case.

 Stephen
 
 P.S. Another limitation of the use Apache servers for all server processing
 philosophy seems to be scheduled events or system events (those not
 initiated by an HTTP request, which are user events).
 

I agree with Perrin, you can use LWP to emulate a users HTTP request if
you want to use an HTTP style request. 

cron/at represents the best way to handle this (IMHO).  In my case I run
the cron job and it generates the temp files, these temp files get
picked up by the looping server (simple non mod_perl daemon) and
processed. So I don't use LWP, but could send the request to the server
and have it create the temp files just as easily, I just happen to have
the logic abstracted to where I don't need to involve the mod_perl.

Aaron 

Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)

2002-11-19 Thread Gunther Birznieks


Rob Nagler wrote:


The antithesis of this is J2EE, which introduces an amazing amount of
complexity through protocol explosion (is it a Message/Session/Entity
Bean, do I use JMX, JMS, RMI, etc.).  It creates tremendous confusion,
and their software is certainly less reliable than Apache.

 

I think this is not a fair statement about J2EE (except the less 
reliable part).

In the context of what you are saying, it seems as if everyone should 
just stick to using TCP/IP/Telnet as a protocol and then the world would 
be a better place.

But I don't think this is so. Everyone ends up creating their own 
protocols, their own algorithms on top of TCP on how to communicate. In 
a way it is simpler because you just have the freedom to create whatever 
you want. But in another way, it is a nightmare because everyone will 
just implement their own way of doing things. This can be OK in some 
contexts, but I find it difficult to believe that this is the best thing 
overall.

At least with J2EE, for every major standard or protocol implemented, 
there is only one way to do it. With Perl, you actually have more 
confusion because there are many more ways to do it. More ways to do 
templating, more ways to do middleware, more ways to do serialization of 
objects, etc...