Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Perrin Harkins writes: Bas A.Schulte wrote: I do when the delivery mechanism has failed for 6 hours and I have 12000 messages in the queue *and* make sure current messages get sent in time? I don't know, that's an application-specific choice. Of course JMS doesn't know either. This is one of the endemic problems with J2EE. It doesn't know, and it has to offer you lots of options to allow you to control the horizontal and vertical. Since it is a distributed platform, it can't export hooks (callbacks), which allow you to decide on the fly. The options get out of control, and make it look like the system is fancier than it really is. Rather, when you see an option, it usually means the developers couldn't agree on what to do (paraphrased from Joel Spolsky, http://www.joelonsoftware.com/). With bOP, we tend to make policy decisions like this centrally, e.g., no exactly-once semantics. There's a real cost, but then we've used bOP for a wide variety of batch and Web applications without much strain so we keep doing it this way. When we stress the system too much, we add a decision point (option) for the programmer. However, we only do this after careful deliberation. This is one of the reasons we don't release bOP in parts as some have suggested. You can use it in layers, but every application we've built ends up using all the layers. J2EE has too many competing/conflicting components, and each of those components can be configured in myriad ways. Only experienced building distributed systems builders can know the trade-offs. J2EE is sold as an everyman's platform for everybody's problem. This means people often get caught using the wrong tool (entity beans) the wrong way (a bean per DB row). There's no easy answer to the problem of distributed systems (esp. one as complex as SMS message queueing), and J2EE gives one the impression there is, all in imiho, of course. :-) BTW, the issue of exactly-once vs at-most-once is a tough one (and was subject to much debate in the 80s). JMS tries to guarantee exactly-once, but that's really hard to do. Especially in an SMS situation where network partitioning is a real problem. My alphanumeric pager service holds messages for 3 days, and that's a long time imo. They can only do this, because pagers aren't bi-directional (for the most part). Once you get into SMS space, where devices are bi-directional and much more useful, you have a real problem promising exactly-once semantics. Rob
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Hi all, On Tuesday, November 19, 2002, at 11:09 PM, Perrin Harkins wrote: Stephen Adkins wrote: So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). It does when you put it like that, but it doesn't have to be that way. I've implemented the exact thing Perrin describes in our SMS game platform (read a bit about it here: http://perl.apache.org/outstanding/success_stories/sms_server.html). When synchronous requests come in that trigger some event that has to take place in the future *and* that runs in the same Apache server instance, I have an external (simple) daemon that reads timer events from a shared database table and posts HTTP requests to the Apache server instance. The reason I did it like this is that I can easily (not to mention quickly) run perl code in Apache *and* it is quite a stable server, much more stable than something I could whip out in perl. I did try some perl preforking server code (from Lincoln D. Stein's book and Net::Server::PreFork as well as some self-programmed stuff) but none of them seemed to be stable/fast under heavy load even though I would have preferred that as it would allow me to do something to handle data-sharing between children via the parent which always seems to be in issue in Apache/mod_perl. The only thing that now and then is problematic is that Apache child processes in which my perl code runs are not easily coordinated (at least I still haven't found a good way). So this situation (from Stephen's mail): We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. still is something I haven't figured out. Basically, I need some way to coordinate the children so each child can find out what the other children are doing. BTW: I've been reading up a lot on J2EE lately and it appears more and more that a J2EE app server could quite nicely provide for my needs (despite all shortcomings and issues of course). Now if there only was a P5EE app server ;) Regards, Bas.
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Bas A.Schulte writes: still is something I haven't figured out. Basically, I need some way to coordinate the children so each child can find out what the other children are doing. Use a table in your database. The DB needs to support row level locking (we use Oracle). Here's an example: insert into resource_lock_t (instance_count) values (1) Don't commit yet. Rather, right before committing, delete the row: delete from resource_lock_t where instance_count = 1 Anybody waiting for instance_count #1 will block until the delete happens. You only allow up to four inserts (instance_count is the primary key). Rob
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Bas A.Schulte wrote: none of them seemed to be stable/fast under heavy load even though I would have preferred that as it would allow me to do something to handle data-sharing between children via the parent which always seems to be in issue in Apache/mod_perl. What are you trying to share? In addition to Rob's suggestion of using a database table (usually the best for important data or clustered machines) there are other approaches like IPC::MM and MLDBM::Sync. Basically, I need some way to coordinate the children so each child can find out what the other children are doing. Either of the approaches I just mentioned would be fine for this. BTW: I've been reading up a lot on J2EE lately and it appears more and more that a J2EE app server could quite nicely provide for my needs (despite all shortcomings and issues of course). What is it that you think you'd be getting that you don't have now? - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Nigel Hamilton wrote: I need to fork a lot of processes per request ... the memory cost of forking an apache child is too high though. So I've written my own mini webserver in Perl It doesn't seem like this would help much. The thing that makes mod_perl processes big is Perl. If you run the same code in both they should have a similar size. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Hi Perrin, On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote: Bas A.Schulte wrote: none of them seemed to be stable/fast under heavy load even though I would have preferred that as it would allow me to do something to handle data-sharing between children via the parent which always seems to be in issue in Apache/mod_perl. What are you trying to share? In addition to Rob's suggestion of using a database table (usually the best for important data or clustered machines) there are other approaches like IPC::MM and MLDBM::Sync. I don't want to use a database table for the sole purpose of sharing data, I mean, I run the Apache/mod_perl servers to handle different components of our system, some run on top of a database and some of them don't. Also, the things I would want to share are fairly dynamic things so a roundtrip to a database would probably add quite some overhead. I have been looking at some of the IPC::Share* modules, the one I think I can use is (not sure here) IPC::ShareLite, but that darned thing won't install on my dev. machine (iBook/OS X) so I've been postponing things a bit ;) My current plan is IPC::MM, stay tuned. As to *what* I'm trying to share: I don't really know yet ;) Dynamic stuff like: - what is a given child doing (to do things like: ok, I'm currently pushing data to some client in 5 children, and I don't want to have another child do this now so stuff this task in a queue somehere so I can process it later); - application state. This is domain-specific so it's a bit hard to explain what I mean. I need serialized and *fast* access to this info so I would prefer not having this in my database. NB: I posted a question on the first issue (look for IPC suggestions sought/talking between children? somewhere in the mod_perl mailinglist, I never seem to recall the proper archive site for it), didn't get any feedback on it as it probably goes beyond what someone would normally want from a web server. BTW: I've been reading up a lot on J2EE lately and it appears more and more that a J2EE app server could quite nicely provide for my needs (despite all shortcomings and issues of course). What is it that you think you'd be getting that you don't have now? Again; I don't know exactly but when I read stuff about entity-, session- and message beans, JMS etc., it has a lot of resemblance with what I'm currently doing by hand i.e. implement functionality like that on top of a bare Apache/mod_perl server. A good example would be JMS: you get this for free (with JBoss anyway ;)) in a J2EE app. server but there's no obvious choice for us perl guys. There are some options I see now and then: Spread/Stem/POE, but none of these choices are obvious in the sense that they are being used by a lot of people to solve the type of problems JMS solves so there's really no one to turn to for advise; again, I'm building stuff between the raw metal and my own stuff. BTW: with the issue on data-sharing: the same thing: I have raw metal (Apache/mod_perl and IPC:MM) and need to implement an API on top of them before I have the needed functionality. Again I'm building stuff again before I can solve my actual business problems. I think these issues point out that we are missing *something*, I know *I* am :) Regards, Bas.
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
At 07:04 PM 11/26/2002 +0100, Bas A.Schulte wrote: On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote: Bas A.Schulte wrote: I have been looking at some of the IPC::Share* modules, the one I think I can use is (not sure here) IPC::ShareLite, but that darned thing won't install on my dev. machine (iBook/OS X) so I've been postponing things a bit ;) My current plan is IPC::MM, stay tuned. Hi, Take a look at http://www.officevision.com/pub/p5ee/components.html#shared_storage There are references to every major shared storage method I have seen discussed on the mod_perl list or elsewhere. There are also some interesting links to mod_perl list discussions on performance comparison and synchronization using the various tools. Stephen
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Quite odd. I read the performance thread that's on the P5EE page which showed that DBI (with MySQL underneath) was very fast, came in 2nd. Anyone care to elaborate why this is? After all, shared-memory is a thing in RAM, why isn't that faster? Hi Bas, You made some really interesting points in your last email ... and I hope it sparks a full discussion. Just a quick point on the MySQL observation above ... MySQL Memory-Hash Tables may be even quicker, again - as the disk is not involved. Your messages could be inserted into a buffer table with a microsecond timestamp and then a separate process(es) pops messages off the queue. This hands the memory consumption problem to MySQL and provides multiple ways of talking to the queue (cronjobs, apache kids etc). At Turbo10, our click-through system choked under heavy load until we implemented it as a memory buffer (MySQL hash table) ... just a thought. Nigel -- Nigel Hamilton Turbo10 Metasearch Engine email: [EMAIL PROTECTED] tel:+44 (0) 207 987 5460 fax:+44 (0) 207 987 5468 http://turbo10.com Search Deeper. Browse Faster.
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Perrin Harkins writes: I think you are vastly over-estimating how much effort JMS/EJB/etc. would save you. EJB doesn't save you anything. It creates work and complexity, esp. Entity Beans. I've built large systems using EJB and Perl. The Perl project was built faster, with fewer people, runs more reliably, runs faster, and the Perl company is still in business, which is the only point that really counts. :-) JMS does solve an interesting problem, but don't use Message Beans, use raw JMS. Make sure JMS isn't looking for a solution, though. Often times, the solution is better and more robustly solved by implementing pending replies from the server. This avoids a number of resource management issues, which can really bog a server. Rob
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Bas A.Schulte wrote: Quite odd. I read the performance thread that's on the P5EE page which showed that DBI (with MySQL underneath) was very fast, came in 2nd. Anyone care to elaborate why this is? After all, shared-memory is a thing in RAM, why isn't that faster? I have an article that I'm working which explains all of this, but the short explanation is that they work by serialzing the entire memory structure with Storable and stuffing it into a shared memory segment, and even reading it requires loading and de-serializing the whole thing. IPC::MM and the file-based ones are much more granular. Also, file systems are very fast on modern OSes because of efficient VM systems that buffer files in memory. I'm not saying I want entity beans here ;) It's just that I've been doing perl to pay for bills and stuff the past few years and see a lot of people having some (possibly perceived?) need for something missing in perl. It may be that they just want someone to tell them how they should do things. J2EE does provide that to a certain degree. If I read your mail, you mention some solutions/directions for some problems I'm dealing with, but that's just my issue (I think; it's just coming to me): we have a lot of raw metal but we do have to do a lot of welding and fitting before we can solve our business problems. That is basically the point. I don't think it's nearly that bad. After my eToys article got published, I got several e-mails from people saying something like we want to do this, but our boss says we have to buy something because of all the INFRASTRUCTURE code we would have to write. Infrastructure? What infrastructure? The only stuff we wrote that was really independent of our application logic were things like a logging class and a singleton class, which can now be had on CPAN. We wrote our own cache system, but that's because it worked in a very specific way that the available tools didn't handle. I think I could do that with CPAN stuff now too. To illustrate that, I'll try to give a real-world example Thanks, it's much easier to talk about specific situations. To deliver these messages, I send them off to another server (using my own invented pseudo-RMI to call a method on that server). I would use HTTP for that, because I'm too lazy to write the RMI code myself. 1. The server that does the delivery has plenty of threads (er, a Apache/mod_perl child) so I hope I have enough of them to deliver the messages at the rate the backend server generates them: one child might take up to 5 seconds to deliver the message but there are plenty childs. Not good. I've seen how this works and miserably fails when a delivery mechanism barfs. If they were so quick to process that you could do it that way, I would have just handled them in the original mod_perl server with a cleanup_handler. Obviously they are not, so that's not an option here. 2. Same as 1 but I never allow one delivery mechanism to use all my Apache/mod_perl children by adding some form of IPC (darned, need to solve my data sharing issues first!) I think they are already solved if you look at the modules I suggested. so the children check what the others are currently doing: if a request comes in for a particular delivery mechanism, I check if we're already doing N delivery attempts and drop the request somewhere (database/file, whatever) if not. I have a daemon running that monitors that queue. I would structure it like this: - Original server takes request, and writes it to a database table that holds the queue. - A cron job checks the queue for messages, reads the status from MLDBM::Sync to see if we have free processes, and passes the request to mod_perl if we do. (Not that this could also be done with something like PersistentPerl instead.) If there are no free processes, they are left on the queue. That daemon gets complicated quickly as it also has to throttle delivery attempts My approach only puts that logic in the cron job. I need some form of persistent storage (with locking) The relational database. Or MLDBM::Sync if you prefer. what do I do when the delivery mechanism has failed for 6 hours and I have 12000 messages in the queue *and* make sure current messages get sent in time? I don't know, that's an application-specific choice. Of course JMS doesn't know either. 3. I install qmail on the various servers, and use that to push messages around. This'll take me a week or so (hopefully) to get it running reliably in production One of the major selling points for qmail is easier setup. You could use pretty much any mail server though if you have more experience with something else. I just like qmail because it's fast. Later on, I realise that for each messages, a fullblown process is forked *per message*: load up perl, compile perl code etc.. I described how to avoid this in another message: use PersistentPerl or equivalent, or pass
Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)
On Friday, Nov 22, 2002, at 02:49 Europe/London, Gunther Birznieks wrote: I disagree. I think it depends on the protocol. A well designed protocol for an application will spread and stand the test of time. Sometimes the protocol doesn't have to be well designed, but just that it's standard can help tremendously. eg if we were a world that said HTTP is it and we should do everything over HTTP, then would you really see SMTP over HTTP? SNMP over HTTP? telnet over HTTP? Why? This doesn't really make sense to me. [OT, because I know this isn't really your point] As someone who's entire job revolves around SMTP these days, I'd love to see mail go over HTTP. SMTP's got no concept of negotiation. It's got little in the way of versioning (HELO vs EHLO). It's got no permanent redirect (e.g. [EMAIL PROTECTED] is now [EMAIL PROTECTED]). It's got very weak handling of binary data. Writing mail server plugins is very non-standardised. Don't get me wrong, SMTP is a great protocol, but HTTP is sometimes just *so* much nicer :-) Matt.
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Aaron Johnson wrote: This model has eased my testing as well since I can run the script completely external of the web server I can run it through a debugger if needed. You realize that you can run mod_perl in the debugger too, right? I use the profiler and debugger with mod_perl frequently. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Aaron Johnson wrote: I know you _can_ , but I don't find it convenient. For me it's pretty much the same as debugging a command-line script. To debug a mod_perl handler I just do something like this: httpd -X -Ddebug Then I hit the URL with a browser or with GET and it pops me into the debugger. I have httpd.conf set up to add the PerlFixupHandler +Apache::DB line when it sees the debug flag. I still don't like to give apache long processes to manage, I feel this can be better handled external of the server and in my case it allows for automation/reports on non-mod_perl machines. I try to code it so that the business logic is not dependent on a certain runtime environment, and then write a small mod_perl handler to call it. Then I can use the same modules in cron jobs and such. It can get tricky in certain situations though, when you want to optimize something for a long-running environment but don't want to break it for one-shot scripts. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Perrin Harkins writes: I try to code it so that the business logic is not dependent on a certain runtime environment, and then write a small mod_perl handler to call it. I've been doing a lot of test-first coding. It makes it so that you start Apache, and the software just runs. With sufficient granularity of unit tests, we find that we don't use the debugger. Run the test, and it tells you what's wrong. Rob
Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)
Gunther Birznieks writes: In the context of what you are saying, it seems as if everyone should just stick to using TCP/IP/Telnet as a protocol and then the world would be a better place. Once upon a time, there was OSI, SNA, DECnet, etc. Nowadays, all computers talk IP, even if you connect from AOL. Yes, the other protocols are still around, but nobody in their right mind would recommend them anymore. But I don't think this is so. Everyone ends up creating their own protocols, their own algorithms on top of TCP on how to communicate. Because it's FUN, and you probably can get a Ph.D. thesis out of it. ;-) In a way it is simpler because you just have the freedom to create whatever you want. But in another way, it is a nightmare because everyone will just implement their own way of doing things. This can be OK in some contexts, but I find it difficult to believe that this is the best thing overall. I'm not advocating this. Rather, I am recommending using a well-known, and arguably the most widely-used protocol: application/x-www-form-urlencoded--and it's near cousin multipart/form-data. However, that's messy, we can just call it HTTP, and our implementation is LWP and Apache. At least with J2EE, for every major standard or protocol implemented, there is only one way to do it. With Perl, you actually have more confusion because there are many more ways to do it. More ways to do templating, more ways to do middleware, more ways to do serialization of objects, etc... There are equivalent number of ways in both languages. If you are saying that you could build a standard component in, say, EJB, and sell it, well, that's just not the case. That's the pipe dream of CORBA. The only thing close to portable protocols is HTTP. Sabre, for example, gives you a library, and you have to interface to it. However, authorize.net's interface is HTTP, and I can write my own library in 100 lines of Perl, which matches my application, and doesn't require me to install anything. There's such a thing as standard protocols, but every application uses them differently. Rob
asynchronous execution, was Re: implementing a set of queue-processing servers
At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote: We digress. The problem is to build a UI to Sabre. I still haven't seen any numbers which demonstrate the simple solution doesn't work. Connecting to Sabre is no different than connecting to an e-commerce gateway. Both can be done by connecting directly from the Apache child to the remote service and returning a result. Hi, My question with this approach is not whether it works for synchronous execution (the user is willing to wait for the results to come back) but whether it makes sense for asynchronous execution (the user will come back and get the results later). In fact, we provide our users with the option: 1. fetch the data now and display it, OR 2. put the request in a queue to be fetched and then later displayed We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). Why not let there be web server processes and queue worker processes and they each do their own job? Web servers seem to me to be for synchronous activity, where the user is waiting for the results. Stephen P.S. Another limitation of the use Apache servers for all server processing philosophy seems to be scheduled events or system events (those not initiated by an HTTP request, which are user events). example: Our system allows users to set up a schedule of requests to be run. i.e. Every Tuesday at 3:00am, put this request into the queue. This is a scheduled event rather than a user event. How is a web server process going to wake up and begin processing this? (unless of course everyone who puts something into the queue must send a dummy HTTP request to wake up the web servers)
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Stephen Adkins wrote: So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). It does when you put it like that, but it doesn't have to be that way. I would separate the input (user or queue) from the processing part. You'd have a module that runs in mod_perl which knows how to process requests. You have a separate module which can provide a UI for placing requests. Synchronous ones go straight to processing, while asynch ones get added to the queue. You'd also have a controlling process that polls the queue and if it finds anything it uses LWP to send it to mod_perl for handling. I would make this a tiny script triggered from cron if possible, since cron is robust and can handle outages and error reporting nicely. Why not let there be web server processes and queue worker processes and they each do their own job? Web servers seem to me to be for synchronous activity, where the user is waiting for the results. When I think of queue processing, I think of a system for handling tasks in parallel that provides a simple API for plugging in logic, a well-defined control interface, logging, easy configuration... sounds like Apache to me. You just need a tiny control process to trigger it via LWP. Apache is already a system for handling a queue of HTTP requests in parallel, so you just have to make your requests look like HTTP. You certainly could do this other ways, but you'd probably have to write a lot more code or else use something far less reliable than Apache. P.S. Another limitation of the use Apache servers for all server processing philosophy seems to be scheduled events or system events (those not initiated by an HTTP request, which are user events). Cron/at + LWP. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Hi Stephen, On Tue, 19 Nov 2002, Stephen Adkins wrote: My question with this approach is not whether it works for synchronous execution (the user is willing to wait for the results to come back) but whether it makes sense for asynchronous execution (the user will come back and get the results later). What kind of interface will you provide to the final users? In fact, we provide our users with the option: 1. fetch the data now and display it, OR 2. put the request in a queue to be fetched and then later displayed We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. So it is possible that an immediate request is queued if the system has already reached its maximum allowed logins. In other words, a final user can request to display data immediately but your middleware can answer that the request has been queued, possibly saying 'your job id, position n, see you later'. Moreover, you must preserve order of requests. And if I recall correctly you talked about a sort of queue listing and some job manipulation. Whatever will be your choice, you undoubtedly need to serialize requests and enqueue them using a dbms. It is the simplest approach. Given a method to add, list or remove requests from this kind of queue, mod_perl (even plain cgi scripts) can use these method to manipulate a user's job. Using access control supplied by Apache, it is possible to give different access rights to users of the middleware. Requests from final users will be always enqueued by an Apache children, that will get a job-id and its position in the queue. If the job is on top of the queue, you will immediately wait for its completion. Otherwise you can tell the user to check his job queue later. Users can remove jobs from the queue. All completed jobs will be stored somewhere (file system and/or db) and can be listed by legitimate users. Jobs completed will show in a separate queue. An external entity will dequeue jobs and process them, probably using something like Parallel::ForkManager to limit concurrent requests. Another entity will enqueue recurring jobs. Jobs scheduled for future processing should always be enqueued immediately, or I can't imagine a coherent interface to remove jobs. These entities look like daemons, that can be spawned and controlled using code executed by Apache. Please note that I never mentioned html, using Apache as your infrastructure you can build whatever interface you need. Requests recorded inside db can also be used to implement a cache, probably reused by following requests. It would be possible to collapse identical requests (to save logins). Obviously it is possible to replace Apache with POE or Stem, but I don't know how, sorry. There are many other solutions, but this sketch describes my way to do it. Sorry for the length of this message. Ciao, Valerio Valerio Paolini, http://130.136.3.200/~paolini -- Linux, the Cheap Chic for Computer Fashionistas
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
On Tue, 2002-11-19 at 16:28, Stephen Adkins wrote: At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote: We digress. The problem is to build a UI to Sabre. I still haven't seen any numbers which demonstrate the simple solution doesn't work. Connecting to Sabre is no different than connecting to an e-commerce gateway. Both can be done by connecting directly from the Apache child to the remote service and returning a result. Hi, My question with this approach is not whether it works for synchronous execution (the user is willing to wait for the results to come back) but whether it makes sense for asynchronous execution (the user will come back and get the results later). In fact, we provide our users with the option: 1. fetch the data now and display it, OR 2. put the request in a queue to be fetched and then later displayed We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). Why not let there be web server processes and queue worker processes and they each do their own job? Web servers seem to me to be for synchronous activity, where the user is waiting for the results. I am doing something similar right now in a project. It has to make approx. 220 requests to outside sources in order to compile a completed report. These reports vary in time to create based on the data sources and network traffic. This is the solution I have in place currently: 1) User visits web page (handled by mod_perl) and they make the request for a report. 2) The request parameters are stored into a temp file and the user is redirected to a wait page. The time spent of the wait page varies and an approx time is created based on query complexity. The user session is given a key that is that matches the temp file name. 3) A separate dedicated server (Proc::Daemon based) picks up the temp file and spawns a child to process it. This daemon looks for new temp files every X seconds, were X is 15 seconds, but it could easily be adjusted. It keeps a queue of the temp files that have been processed and drops them from the queue after 45 minutes even if they haven't run. 4) The child recreates the users object and runs the report, when it completes it deletes the temp file. If it fails to complete the temp file remains. 5) When the auto refresh takes place the system determines if the users request has completed by looking for the temp file named in their session data. If the file exists they are given another wait page with a 30 to 120 second wait time. If it doesn't exist then the cached information from the report, just an XML file created from a XML::Simple dump of the hash containing the report data, is processed and presented as HTML to the user. I had attempted using a mod_perl only solution, but I didn't like tying up the server with additional processing that could be handled externally. This method also allows for the server script to reside on a separate machine (allowing for some shared filesystem samba, NFS etc) without having to recreate an entire mod_perl environment. This model has eased my testing as well since I can run the script completely external of the web server I can run it through a debugger if needed. I also use the same script for nightly automated common reports to limit the number of real time requests since the data doesn't can that frequently in my case. Stephen P.S. Another limitation of the use Apache servers for all server processing philosophy seems to be scheduled events or system events (those not initiated by an HTTP request, which are user events). I agree with Perrin, you can use LWP to emulate a users HTTP request if you want to use an HTTP style request. cron/at represents the best way to handle this (IMHO). In my case I run the cron job and it generates the temp files, these temp files get picked up by the looping server (simple non mod_perl daemon) and processed. So I don't use LWP, but could send the request to the server and have it create the temp files just as easily, I just happen to have the logic abstracted to where I don't need to involve the mod_perl. Aaron
Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)
Rob Nagler wrote: The antithesis of this is J2EE, which introduces an amazing amount of complexity through protocol explosion (is it a Message/Session/Entity Bean, do I use JMX, JMS, RMI, etc.). It creates tremendous confusion, and their software is certainly less reliable than Apache. I think this is not a fair statement about J2EE (except the less reliable part). In the context of what you are saying, it seems as if everyone should just stick to using TCP/IP/Telnet as a protocol and then the world would be a better place. But I don't think this is so. Everyone ends up creating their own protocols, their own algorithms on top of TCP on how to communicate. In a way it is simpler because you just have the freedom to create whatever you want. But in another way, it is a nightmare because everyone will just implement their own way of doing things. This can be OK in some contexts, but I find it difficult to believe that this is the best thing overall. At least with J2EE, for every major standard or protocol implemented, there is only one way to do it. With Perl, you actually have more confusion because there are many more ways to do it. More ways to do templating, more ways to do middleware, more ways to do serialization of objects, etc...