Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Perrin Harkins writes: Bas A.Schulte wrote: I do when the delivery mechanism has failed for 6 hours and I have 12000 messages in the queue *and* make sure current messages get sent in time? I don't know, that's an application-specific choice. Of course JMS doesn't know either. This is one of the endemic problems with J2EE. It doesn't know, and it has to offer you lots of options to allow you to control the horizontal and vertical. Since it is a distributed platform, it can't export hooks (callbacks), which allow you to decide on the fly. The options get out of control, and make it look like the system is fancier than it really is. Rather, when you see an option, it usually means the developers couldn't agree on what to do (paraphrased from Joel Spolsky, http://www.joelonsoftware.com/). With bOP, we tend to make policy decisions like this centrally, e.g., no exactly-once semantics. There's a real cost, but then we've used bOP for a wide variety of batch and Web applications without much strain so we keep doing it this way. When we stress the system too much, we add a decision point (option) for the programmer. However, we only do this after careful deliberation. This is one of the reasons we don't release bOP in parts as some have suggested. You can use it in layers, but every application we've built ends up using all the layers. J2EE has too many competing/conflicting components, and each of those components can be configured in myriad ways. Only experienced building distributed systems builders can know the trade-offs. J2EE is sold as an everyman's platform for everybody's problem. This means people often get caught using the wrong tool (entity beans) the wrong way (a bean per DB row). There's no easy answer to the problem of distributed systems (esp. one as complex as SMS message queueing), and J2EE gives one the impression there is, all in imiho, of course. :-) BTW, the issue of exactly-once vs at-most-once is a tough one (and was subject to much debate in the 80s). JMS tries to guarantee exactly-once, but that's really hard to do. Especially in an SMS situation where network partitioning is a real problem. My alphanumeric pager service holds messages for 3 days, and that's a long time imo. They can only do this, because pagers aren't bi-directional (for the most part). Once you get into SMS space, where devices are bi-directional and much more useful, you have a real problem promising exactly-once semantics. Rob
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Hi all, On Tuesday, November 19, 2002, at 11:09 PM, Perrin Harkins wrote: Stephen Adkins wrote: So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). It does when you put it like that, but it doesn't have to be that way. I've implemented the exact thing Perrin describes in our SMS game platform (read a bit about it here: http://perl.apache.org/outstanding/success_stories/sms_server.html). When synchronous requests come in that trigger some event that has to take place in the future *and* that runs in the same Apache server instance, I have an external (simple) daemon that reads timer events from a shared database table and posts HTTP requests to the Apache server instance. The reason I did it like this is that I can easily (not to mention quickly) run perl code in Apache *and* it is quite a stable server, much more stable than something I could whip out in perl. I did try some perl preforking server code (from Lincoln D. Stein's book and Net::Server::PreFork as well as some self-programmed stuff) but none of them seemed to be stable/fast under heavy load even though I would have preferred that as it would allow me to do something to handle data-sharing between children via the parent which always seems to be in issue in Apache/mod_perl. The only thing that now and then is problematic is that Apache child processes in which my perl code runs are not easily coordinated (at least I still haven't found a good way). So this situation (from Stephen's mail): We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. still is something I haven't figured out. Basically, I need some way to coordinate the children so each child can find out what the other children are doing. BTW: I've been reading up a lot on J2EE lately and it appears more and more that a J2EE app server could quite nicely provide for my needs (despite all shortcomings and issues of course). Now if there only was a P5EE app server ;) Regards, Bas.
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Bas A.Schulte writes: still is something I haven't figured out. Basically, I need some way to coordinate the children so each child can find out what the other children are doing. Use a table in your database. The DB needs to support row level locking (we use Oracle). Here's an example: insert into resource_lock_t (instance_count) values (1) Don't commit yet. Rather, right before committing, delete the row: delete from resource_lock_t where instance_count = 1 Anybody waiting for instance_count #1 will block until the delete happens. You only allow up to four inserts (instance_count is the primary key). Rob
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Bas A.Schulte wrote: none of them seemed to be stable/fast under heavy load even though I would have preferred that as it would allow me to do something to handle data-sharing between children via the parent which always seems to be in issue in Apache/mod_perl. What are you trying to share? In addition to Rob's suggestion of using a database table (usually the best for important data or clustered machines) there are other approaches like IPC::MM and MLDBM::Sync. Basically, I need some way to coordinate the children so each child can find out what the other children are doing. Either of the approaches I just mentioned would be fine for this. BTW: I've been reading up a lot on J2EE lately and it appears more and more that a J2EE app server could quite nicely provide for my needs (despite all shortcomings and issues of course). What is it that you think you'd be getting that you don't have now? - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Nigel Hamilton wrote: I need to fork a lot of processes per request ... the memory cost of forking an apache child is too high though. So I've written my own mini webserver in Perl It doesn't seem like this would help much. The thing that makes mod_perl processes big is Perl. If you run the same code in both they should have a similar size. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Hi Perrin, On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote: Bas A.Schulte wrote: none of them seemed to be stable/fast under heavy load even though I would have preferred that as it would allow me to do something to handle data-sharing between children via the parent which always seems to be in issue in Apache/mod_perl. What are you trying to share? In addition to Rob's suggestion of using a database table (usually the best for important data or clustered machines) there are other approaches like IPC::MM and MLDBM::Sync. I don't want to use a database table for the sole purpose of sharing data, I mean, I run the Apache/mod_perl servers to handle different components of our system, some run on top of a database and some of them don't. Also, the things I would want to share are fairly dynamic things so a roundtrip to a database would probably add quite some overhead. I have been looking at some of the IPC::Share* modules, the one I think I can use is (not sure here) IPC::ShareLite, but that darned thing won't install on my dev. machine (iBook/OS X) so I've been postponing things a bit ;) My current plan is IPC::MM, stay tuned. As to *what* I'm trying to share: I don't really know yet ;) Dynamic stuff like: - what is a given child doing (to do things like: ok, I'm currently pushing data to some client in 5 children, and I don't want to have another child do this now so stuff this task in a queue somehere so I can process it later); - application state. This is domain-specific so it's a bit hard to explain what I mean. I need serialized and *fast* access to this info so I would prefer not having this in my database. NB: I posted a question on the first issue (look for IPC suggestions sought/talking between children? somewhere in the mod_perl mailinglist, I never seem to recall the proper archive site for it), didn't get any feedback on it as it probably goes beyond what someone would normally want from a web server. BTW: I've been reading up a lot on J2EE lately and it appears more and more that a J2EE app server could quite nicely provide for my needs (despite all shortcomings and issues of course). What is it that you think you'd be getting that you don't have now? Again; I don't know exactly but when I read stuff about entity-, session- and message beans, JMS etc., it has a lot of resemblance with what I'm currently doing by hand i.e. implement functionality like that on top of a bare Apache/mod_perl server. A good example would be JMS: you get this for free (with JBoss anyway ;)) in a J2EE app. server but there's no obvious choice for us perl guys. There are some options I see now and then: Spread/Stem/POE, but none of these choices are obvious in the sense that they are being used by a lot of people to solve the type of problems JMS solves so there's really no one to turn to for advise; again, I'm building stuff between the raw metal and my own stuff. BTW: with the issue on data-sharing: the same thing: I have raw metal (Apache/mod_perl and IPC:MM) and need to implement an API on top of them before I have the needed functionality. Again I'm building stuff again before I can solve my actual business problems. I think these issues point out that we are missing *something*, I know *I* am :) Regards, Bas.
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
At 07:04 PM 11/26/2002 +0100, Bas A.Schulte wrote: On Tuesday, November 26, 2002, at 06:14 PM, Perrin Harkins wrote: Bas A.Schulte wrote: I have been looking at some of the IPC::Share* modules, the one I think I can use is (not sure here) IPC::ShareLite, but that darned thing won't install on my dev. machine (iBook/OS X) so I've been postponing things a bit ;) My current plan is IPC::MM, stay tuned. Hi, Take a look at http://www.officevision.com/pub/p5ee/components.html#shared_storage There are references to every major shared storage method I have seen discussed on the mod_perl list or elsewhere. There are also some interesting links to mod_perl list discussions on performance comparison and synchronization using the various tools. Stephen
Re: implementing a set of queue-processing servers
On Mon, Nov 25, 2002 at 07:31:35PM -0700, Rob Nagler wrote: Matt Sergeant writes: There's a huge difference in what they are trying to achieve though. POE doesn't open any files and it doesn't write any files to disk. None of it is written in C (yet), so unless there's a buffer overrun or type mismatch bug in perl you can exploit, you're not going to get in that way. I agree that Perl is a safe language (independent of taint, which adds safety). Unfortunately, there has been a history of insecure Perl programs (formail.pl, I think being the most famous). This may be a consequence of bad programming, but you have to look at the average if you are selecting a system without reviewing every line of code, i.e., performing a security audit. Rating all of CPAN according to the quality of the average module does a disservice to its better half. Depreciating its good distributions also feeds into the myth that all Perl software is shoddy. I trust Linux more than Apache, for example, because Linux is not only older, but was built using an interface design which is 30 years old and has been allowed to evolve. It seems naive to assume that an older project is more reliable than a younger one. Inception dates have no bearing on the age and quality of source code, otherwise djbdns would be considered less reliable than bind. I'm not honestly suggesting it's bug free, but I fail to see how a bug in POE would give you access to the system. Use of a user string incorrectly in a system or open might do it. Also, an incorrect chown, chmod, umask, etc. A casual grep through POE's source would reveal that it doesn't do any of this. You seem to be making claims against POE based on broad generalization rather than research. Regardless of your intent, representing these opinions as facts does damage the project's reputation, since they are available out of context and forever through the list's archives. -- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Quite odd. I read the performance thread that's on the P5EE page which showed that DBI (with MySQL underneath) was very fast, came in 2nd. Anyone care to elaborate why this is? After all, shared-memory is a thing in RAM, why isn't that faster? Hi Bas, You made some really interesting points in your last email ... and I hope it sparks a full discussion. Just a quick point on the MySQL observation above ... MySQL Memory-Hash Tables may be even quicker, again - as the disk is not involved. Your messages could be inserted into a buffer table with a microsecond timestamp and then a separate process(es) pops messages off the queue. This hands the memory consumption problem to MySQL and provides multiple ways of talking to the queue (cronjobs, apache kids etc). At Turbo10, our click-through system choked under heavy load until we implemented it as a memory buffer (MySQL hash table) ... just a thought. Nigel -- Nigel Hamilton Turbo10 Metasearch Engine email: [EMAIL PROTECTED] tel:+44 (0) 207 987 5460 fax:+44 (0) 207 987 5468 http://turbo10.com Search Deeper. Browse Faster.
Re: implementing a set of queue-processing servers
Rocco Caputo writes: Rating all of CPAN according to the quality of the average module does a disservice to its better half. Depreciating its good distributions also feeds into the myth that all Perl software is shoddy. That isn't what I said. I program Perl daily. I use a bunch of CPAN on a daily basis. It's important to look at the average of all software. It's just like I would rather fly in an airplane 100 miles than drive 100 miles. It seems naive to assume that an older project is more reliable than a younger one. Inception dates have no bearing on the age and quality of source code, otherwise djbdns would be considered less reliable than bind. Is old code is good code a myth then? It's certainly bandied about often enough. Use of a user string incorrectly in a system or open might do it. Also, an incorrect chown, chmod, umask, etc. A casual grep through POE's source would reveal that it doesn't do any of this. I looked briefly at UserBase.pm, because it seems to have something to do with security. I came up with a few questions which weren't easily resolved. There are probably good answers to all my questions, but I'm a fairly experienced programmer and my casual observations didn't find them. I wouldn't find easy answers for Apache either, but I *trust* Apache from its reputation alone. That's the best I can do, and that's what I've been arguing about. Anyway, here's a quick list: -d $heap-{Dir} || mkdir $heap-{Dir},0755; Is $heap-{Dir} supposed to be readable by everybody? What is $head-{Dir}? Will it contain data from the heap on disk? What if there's a clear text password in the heap? open FILE,$heap-{File} or croak qq($heap-{_type} could not open '$heap-{File}'.); This contains a small error: there should always be a space after . unlink $heap-{Dir}/$href-{user_name} if $href-{new_user_name}; What if $heap-{Dir} is misconfigured and set to /var/mail? Is POE running as root? sub poco_userbase_update { my $heap = $_[HEAP]; my $protocol = $heap-{Protocol}; my %params = splice @_,ARG0; for($heap-{Cipher}) { $_ is set, and it isn't local($_). This is a problem, because other code gets values. Always use lexically scoped variables. Dynamically scoped variables are a major source of unexpected behavior. Nit: Barewords are bad imiho. ARG0 and HEAP should be subroutines or methods. my $stm = _EOSTM_; delete from $heap-{Table} where $heap-{UserColumn} = '$href-{user_name}' _EOSTM_ $stm .= qq[ and $heap-{DomainColumn} = '$href-{domain}'] if $href-{domain}; This is naive SQL. What if the user_name or domain has a ' in it? What if it contains arbitrary code such as: dontcare' OR user_name like '% Bad news. Use '?' for all arguments except constants. The result isn't checked to see how many records were deleted either. You seem to be making claims against POE based on broad generalization rather than research. Regardless of your intent, representing these opinions as facts does damage the project's reputation, since they are available out of context and forever through the list's archives. I have no doubt POE is written well and certainly with the best intentions. Let that stand in the archives forever. However, the debate was not about POE vs Apache, but essentially about old code vs new code--with a side discussion about security through obscurity. Given two packages, which I'm not familiar with, I'll take the older one over the new one any day. Rob
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Perrin Harkins writes: I think you are vastly over-estimating how much effort JMS/EJB/etc. would save you. EJB doesn't save you anything. It creates work and complexity, esp. Entity Beans. I've built large systems using EJB and Perl. The Perl project was built faster, with fewer people, runs more reliably, runs faster, and the Perl company is still in business, which is the only point that really counts. :-) JMS does solve an interesting problem, but don't use Message Beans, use raw JMS. Make sure JMS isn't looking for a solution, though. Often times, the solution is better and more robustly solved by implementing pending replies from the server. This avoids a number of resource management issues, which can really bog a server. Rob
Re: implementing a set of queue-processing servers
On Tue, Nov 26, 2002 at 04:26:13PM -0700, Rob Nagler wrote: Rob Nagler also wrote: I trust Linux more than Apache, for example, because Linux is not only older, but was built using an interface design which is 30 years old and has been allowed to evolve. Rocco Caputo wrote: It seems naive to assume that an older project is more reliable than a younger one. Inception dates have no bearing on the age and quality of source code, otherwise djbdns would be considered less reliable than bind. Rob Nagler again: Is old code is good code a myth then? It's certainly bandied about often enough. First I'd like to apologize for reading more into your posts than you intended. Thanks for making things clear in your last message. On average, older projects may tend to be more reliable than younger ones, but old code is good code is not a hard rule. It also applies more to code than to projects like Linux and Apache as whole things. The age of a project is no guarantee of the age of its code. The assertion also assumes at least three things about code. It relies on all code being born at the same level of quality. It demands that all code progresses towards Quality Nirvana at a constant rate. It assumes that updates never make things worse than before. Rocco Caputo writes: Rob Nagler wrote: Use of a user string incorrectly in a system or open might do it. Also, an incorrect chown, chmod, umask, etc. A casual grep through POE's source would reveal that it doesn't do any of this. I looked briefly at UserBase.pm, because it seems to have something to do with security. I came up with a few questions which weren't easily resolved. There are probably good answers to all my questions, but I'm a fairly experienced programmer and my casual observations didn't find them. I wouldn't find easy answers for Apache either, but I *trust* Apache from its reputation alone. That's the best I can do, and that's what I've been arguing about. [...] UserBase is a third-party module using POE, but it's not part of POE itself. The relationship between the two is similar to the one between J. Random CPAN Module and Perl. Your comments are very useful, though. Thank you. I'll forward them to the module's author. -- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Bas A.Schulte wrote: Quite odd. I read the performance thread that's on the P5EE page which showed that DBI (with MySQL underneath) was very fast, came in 2nd. Anyone care to elaborate why this is? After all, shared-memory is a thing in RAM, why isn't that faster? I have an article that I'm working which explains all of this, but the short explanation is that they work by serialzing the entire memory structure with Storable and stuffing it into a shared memory segment, and even reading it requires loading and de-serializing the whole thing. IPC::MM and the file-based ones are much more granular. Also, file systems are very fast on modern OSes because of efficient VM systems that buffer files in memory. I'm not saying I want entity beans here ;) It's just that I've been doing perl to pay for bills and stuff the past few years and see a lot of people having some (possibly perceived?) need for something missing in perl. It may be that they just want someone to tell them how they should do things. J2EE does provide that to a certain degree. If I read your mail, you mention some solutions/directions for some problems I'm dealing with, but that's just my issue (I think; it's just coming to me): we have a lot of raw metal but we do have to do a lot of welding and fitting before we can solve our business problems. That is basically the point. I don't think it's nearly that bad. After my eToys article got published, I got several e-mails from people saying something like we want to do this, but our boss says we have to buy something because of all the INFRASTRUCTURE code we would have to write. Infrastructure? What infrastructure? The only stuff we wrote that was really independent of our application logic were things like a logging class and a singleton class, which can now be had on CPAN. We wrote our own cache system, but that's because it worked in a very specific way that the available tools didn't handle. I think I could do that with CPAN stuff now too. To illustrate that, I'll try to give a real-world example Thanks, it's much easier to talk about specific situations. To deliver these messages, I send them off to another server (using my own invented pseudo-RMI to call a method on that server). I would use HTTP for that, because I'm too lazy to write the RMI code myself. 1. The server that does the delivery has plenty of threads (er, a Apache/mod_perl child) so I hope I have enough of them to deliver the messages at the rate the backend server generates them: one child might take up to 5 seconds to deliver the message but there are plenty childs. Not good. I've seen how this works and miserably fails when a delivery mechanism barfs. If they were so quick to process that you could do it that way, I would have just handled them in the original mod_perl server with a cleanup_handler. Obviously they are not, so that's not an option here. 2. Same as 1 but I never allow one delivery mechanism to use all my Apache/mod_perl children by adding some form of IPC (darned, need to solve my data sharing issues first!) I think they are already solved if you look at the modules I suggested. so the children check what the others are currently doing: if a request comes in for a particular delivery mechanism, I check if we're already doing N delivery attempts and drop the request somewhere (database/file, whatever) if not. I have a daemon running that monitors that queue. I would structure it like this: - Original server takes request, and writes it to a database table that holds the queue. - A cron job checks the queue for messages, reads the status from MLDBM::Sync to see if we have free processes, and passes the request to mod_perl if we do. (Not that this could also be done with something like PersistentPerl instead.) If there are no free processes, they are left on the queue. That daemon gets complicated quickly as it also has to throttle delivery attempts My approach only puts that logic in the cron job. I need some form of persistent storage (with locking) The relational database. Or MLDBM::Sync if you prefer. what do I do when the delivery mechanism has failed for 6 hours and I have 12000 messages in the queue *and* make sure current messages get sent in time? I don't know, that's an application-specific choice. Of course JMS doesn't know either. 3. I install qmail on the various servers, and use that to push messages around. This'll take me a week or so (hopefully) to get it running reliably in production One of the major selling points for qmail is easier setup. You could use pretty much any mail server though if you have more experience with something else. I just like qmail because it's fast. Later on, I realise that for each messages, a fullblown process is forked *per message*: load up perl, compile perl code etc.. I described how to avoid this in another message: use PersistentPerl or equivalent, or pass
Re: implementing a set of queue-processing servers
Matt Sergeant writes: There's a huge difference in what they are trying to achieve though. POE doesn't open any files and it doesn't write any files to disk. None of it is written in C (yet), so unless there's a buffer overrun or type mismatch bug in perl you can exploit, you're not going to get in that way. I agree that Perl is a safe language (independent of taint, which adds safety). Unfortunately, there has been a history of insecure Perl programs (formail.pl, I think being the most famous). This may be a consequence of bad programming, but you have to look at the average if you are selecting a system without reviewing every line of code, i.e., performing a security audit. I trust Linux more than Apache, for example, because Linux is not only older, but was built using an interface design which is 30 years old and has been allowed to evolve. I'm not honestly suggesting it's bug free, but I fail to see how a bug in POE would give you access to the system. Use of a user string incorrectly in a system or open might do it. Also, an incorrect chown, chmod, umask, etc. Now user code written on top of POE (or Apache) is another matter altogether. :) Rob
Re: web security (was implementing a set of queue-processing servers)
Gunther Birznieks writes: I am not sure it is a bad example. It is an extreme example, so therefore biased, but Apache is also a biased project because of Apache's role in the Web. Agreed. That's true. But if you have a collocation facility, you also don't have an intranet on the other side like host based system. I don't really consider collocated servers enterprise in the sense of having to link with real systems like Sabre or Funds Transfer for a bank account, medical records lookup, etc... If you start looking around, you'll find a lot of companies are trusting collocation facilities for large financial transactions. There really isn't a practical way to host these in a colocated facilty and still claim the same level of security you could architect otherwise. So that's why Exodus went out of business! ;-) I think the problem of software security can be solved without considering physical access. Also, most corporate computer rooms are probably less secure than commercial collocation facilities--at least from my experience. But then it depends on your risk level. Personally, I like allowing VPN in (eg SSH) for my own convenience. But I've yet to run into a bank (for example) or large corporate with similar types of systems that allow SSH in. Some even don't allow SSH out because they fear it as a channel through which large amounts of proprietary data can be transferred by internal employees. But they run Wi-Fi without a problem. :-) The norm I think is to find SSH coming in from outside to be verboten and SSH coming to the server from inside to be grudgingly OK and usually only allowed through the firewall from some specific operational hosts. You are correct, but I don't think this actually solves a security problem. Otherwise, most companies wouldn't have problems with virii. They do, and that's the type of attack we are most likely to run into. client to allow us that convenience. I've run into many dot.com startups that allow us that convenience, but never a larger corporate especially if their web services are more complex (eg granting limited access to medical records). On one job I had a large medical clinic *email* medical records for test data. These were real people's records. I was shocked, but not for long. I have found that most IT policies are porous, esp. at the top. Consultants come in with their own laptops. I've done this on numerous occasions at large companies. The point is that while some of the IT security policies stop a certain amount of nuisance security problems, they don't normally prevent crackers. Usually, once inside, you can *telnet* from machine to machine. Never mind SMB. I think this is reasonable for a co-location center. But not for an enterprise that has it's own webserver. If the only thing protecting the webserver from the intranet is iptables on the webserver itself, what happens when someone breaks into the webserver? The next thing to go would be iptables and then the machine is exposed to the rest of the intranet. Well, that's where good sysadmin comes in, and why you sandbox apache (run it as a non-root user). You would need a machine between the web server and the intranet to block access definitively. This is impossible. I think iptables/ipchains is one of the coolest things to get free with linux compared to other OSes, but also if the machine is a public service machine and someone breaks into that public service, they can disable security features for sure. Yes, and Cisco installed IIS in its DSL routers. Can you say Nimda? Cisco is not perfect, and neither is Linux. If the FW is separate, even if they break into the web server, the cracker can't all of a sudden open up a lot of other ports such as conveniently allowing the web server to listen to telnet and FTP and installing those services. The cracker would have to either disable the web server and install FTP to listen on port 80 (thereby obviously disrupting service) or install some CGI's that do the equivalent (which won't be as convenient). This doesn't make sense. If Apache or POE is cracked and can run arbitrary code, then anything can *go* *out* from the inside. This reverse tunneling is pretty much what these DDoS do. A FW usually can't prevent opening port 80 on a remote server. That's all you need to spread or attack. There are even those who advocate putting in two different firewalls between the layers so if a bug is found in one firewall, the other firewall will still hold the rules up. The logic is that the likelihood of both firewalls having an exploit discovered at the same time is extremely unlikely. A good idea. However, you also increase your risk when the software runs on a general-purpose machine. I think that's what we're talking about: POE vs Apache on a server. Multi-layered firewalls are fine, but they can't stop a compromised machine from doing
Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)
On Friday, Nov 22, 2002, at 02:49 Europe/London, Gunther Birznieks wrote: I disagree. I think it depends on the protocol. A well designed protocol for an application will spread and stand the test of time. Sometimes the protocol doesn't have to be well designed, but just that it's standard can help tremendously. eg if we were a world that said HTTP is it and we should do everything over HTTP, then would you really see SMTP over HTTP? SNMP over HTTP? telnet over HTTP? Why? This doesn't really make sense to me. [OT, because I know this isn't really your point] As someone who's entire job revolves around SMTP these days, I'd love to see mail go over HTTP. SMTP's got no concept of negotiation. It's got little in the way of versioning (HELO vs EHLO). It's got no permanent redirect (e.g. [EMAIL PROTECTED] is now [EMAIL PROTECTED]). It's got very weak handling of binary data. Writing mail server plugins is very non-standardised. Don't get me wrong, SMTP is a great protocol, but HTTP is sometimes just *so* much nicer :-) Matt.
Re: web security (was implementing a set of queue-processing servers)
Rob Nagler wrote: This isn't because more eyes looked at postfix than sendmail, but that the eye that designed postfix was a security-minded eye and his friends who are also security minded also likely had a hand in audit. Sendmail is a bad example. ;-) I agree that quality does make a difference. I'm speaking about averages. I am not sure it is a bad example. It is an extreme example, so therefore biased, but Apache is also a biased project because of Apache's role in the Web. What I mean is that if this were a secure site, you would never allow SSH to come in from the outside layers to the progressively internal layers. Connections should only be allowed from inside out. When all I have is a collocation facility, there's no choice. I've got to come in through the front-end. That's true. But if you have a collocation facility, you also don't have an intranet on the other side like host based system. I don't really consider collocated servers enterprise in the sense of having to link with real systems like Sabre or Funds Transfer for a bank account, medical records lookup, etc... There really isn't a practical way to host these in a colocated facilty and still claim the same level of security you could architect otherwise. So if you have a separate firewall protected zone for the web server, Are you saying that the firewall protects your network, and defines that as inside? Well, not necessarily THE firewall, but a firewall or group of firewalls as external entities protecting access to and from various network partitions. Sometimes people call these DMZ or multi-DMZ, and others will say that the word DMZ is completely incorrect because each network segment really has it's own rules. The only thing the web server should have access to is the protocol and port to access the app server. You have to be able to login. I don't see how you would administer it otherwise? In an enterprise system, your operators would be on the inside of the LAN and be able to go from inside out. You are talking as if you are a 3rd party vendor or the collocation and therefore you have to go from outside in. Outside in is always going to be less secure. But then it depends on your risk level. Personally, I like allowing VPN in (eg SSH) for my own convenience. But I've yet to run into a bank (for example) or large corporate with similar types of systems that allow SSH in. Some even don't allow SSH out because they fear it as a channel through which large amounts of proprietary data can be transferred by internal employees. The norm I think is to find SSH coming in from outside to be verboten and SSH coming to the server from inside to be grudgingly OK and usually only allowed through the firewall from some specific operational hosts. Anyway, again, different hosts have different issues. For myself, just as you, I prefer convenience. But very few corporate clients I have allow us that convenience. In fact, I've never run into a corporate client to allow us that convenience. I've run into many dot.com startups that allow us that convenience, but never a larger corporate especially if their web services are more complex (eg granting limited access to medical records). And many times they are because they are useful. It's pretty rare to find a bare Apache. If we are talking about enterprise systems, they had better be bare, or the programmers/admins are not very good at what they do. There's no need to run inetd, popd, etc. on most systems. By not bare I meant the Apache itself. For example, I don't think it's uncommon to find mod_proxy, mod_rewrite and mod_ssl on an Apache exposed as the front-end at minimum. In a 3 tier application where the access to the app server and DB server are also protected by FWs then if a cracker cracks the Apache web server, they fact that they have to crack the app server which is running a separate set of code (eg POE) is going to be a major hinderence. I like this quote: In the early 1990s, firewall pioneer Bill Cheswick described the network perimeter where he worked at Bell Labs as having a crunchy shell around a soft, chewy center. We don't have any firewalls. All machines run ipchains or iptables. They run minimal configurations. We only allow encrypted access except for public Web servers. Firewalls are a crutch for bad security. Your network has to be composed of jawbreakers. I think this is reasonable for a co-location center. But not for an enterprise that has it's own webserver. If the only thing protecting the webserver from the intranet is iptables on the webserver itself, what happens when someone breaks into the webserver? The next thing to go would be iptables and then the machine is exposed to the rest of the intranet. You would need a machine between the web server and the intranet to block access definitively. I think iptables/ipchains is one
Re: web security (was implementing a set of queue-processing servers)
Gunther Birznieks writes: That will surely be easier than figuring our the vulnerabilities for myself. Allowing an exploit to be posted will let me be a part-time cracker and all I need to do is wait with a skeleton of injection code, ready to strike when the exploit is publicized. But in the latter case (finding vulnerabilities myself), I will likely have to make it my full time job (either that or I would have to be a high school/college student :)) I agree, but I think we have digressed into the realm of motivation... Whereas, you will much less likely see a security audit on someone's EJB or POE server because it is not on the front-line. I'm sure Arthur Andersen sells such auditing services. :-) This isn't because more eyes looked at postfix than sendmail, but that the eye that designed postfix was a security-minded eye and his friends who are also security minded also likely had a hand in audit. Sendmail is a bad example. ;-) I agree that quality does make a difference. I'm speaking about averages. What I mean is that if this were a secure site, you would never allow SSH to come in from the outside layers to the progressively internal layers. Connections should only be allowed from inside out. When all I have is a collocation facility, there's no choice. I've got to come in through the front-end. So if you have a separate firewall protected zone for the web server, Are you saying that the firewall protects your network, and defines that as inside? The only thing the web server should have access to is the protocol and port to access the app server. You have to be able to login. I don't see how you would administer it otherwise? And many times they are because they are useful. It's pretty rare to find a bare Apache. If we are talking about enterprise systems, they had better be bare, or the programmers/admins are not very good at what they do. There's no need to run inetd, popd, etc. on most systems. But in any case, I think my first and primary point has also been lost. Never! I treasure it. In a 3 tier application where the access to the app server and DB server are also protected by FWs then if a cracker cracks the Apache web server, they fact that they have to crack the app server which is running a separate set of code (eg POE) is going to be a major hinderence. I like this quote: In the early 1990s, firewall pioneer Bill Cheswick described the network perimeter where he worked at Bell Labs as having a crunchy shell around a soft, chewy center. We don't have any firewalls. All machines run ipchains or iptables. They run minimal configurations. We only allow encrypted access except for public Web servers. Firewalls are a crutch for bad security. Your network has to be composed of jawbreakers. However, even if you are thinking a cracker will discover vulnerabilities from scratch, and you think it is easy to do so on POE, I think you are still majorly hindering the cracker by having POE exist. AKA, security through obscurity. It works, but I also think it breeds bad designs, just like sessions in Web servers lead to laziness and bad designs. In summary, I think it is much more plausible that your DB server (or mainframe host) is toast if all you have as a layer in front of it is Apache than if you have an App Server layer between the two. Which is why we encrypt all critical data in our DB, and we start our Web servers by hand with a long key. Rob
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Aaron Johnson wrote: This model has eased my testing as well since I can run the script completely external of the web server I can run it through a debugger if needed. You realize that you can run mod_perl in the debugger too, right? I use the profiler and debugger with mod_perl frequently. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Aaron Johnson wrote: I know you _can_ , but I don't find it convenient. For me it's pretty much the same as debugging a command-line script. To debug a mod_perl handler I just do something like this: httpd -X -Ddebug Then I hit the URL with a browser or with GET and it pops me into the debugger. I have httpd.conf set up to add the PerlFixupHandler +Apache::DB line when it sees the debug flag. I still don't like to give apache long processes to manage, I feel this can be better handled external of the server and in my case it allows for automation/reports on non-mod_perl machines. I try to code it so that the business logic is not dependent on a certain runtime environment, and then write a small mod_perl handler to call it. Then I can use the same modules in cron jobs and such. It can get tricky in certain situations though, when you want to optimize something for a long-running environment but don't want to break it for one-shot scripts. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Perrin Harkins writes: I try to code it so that the business logic is not dependent on a certain runtime environment, and then write a small mod_perl handler to call it. I've been doing a lot of test-first coding. It makes it so that you start Apache, and the software just runs. With sufficient granularity of unit tests, we find that we don't use the debugger. Run the test, and it tells you what's wrong. Rob
Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)
Gunther Birznieks writes: In the context of what you are saying, it seems as if everyone should just stick to using TCP/IP/Telnet as a protocol and then the world would be a better place. Once upon a time, there was OSI, SNA, DECnet, etc. Nowadays, all computers talk IP, even if you connect from AOL. Yes, the other protocols are still around, but nobody in their right mind would recommend them anymore. But I don't think this is so. Everyone ends up creating their own protocols, their own algorithms on top of TCP on how to communicate. Because it's FUN, and you probably can get a Ph.D. thesis out of it. ;-) In a way it is simpler because you just have the freedom to create whatever you want. But in another way, it is a nightmare because everyone will just implement their own way of doing things. This can be OK in some contexts, but I find it difficult to believe that this is the best thing overall. I'm not advocating this. Rather, I am recommending using a well-known, and arguably the most widely-used protocol: application/x-www-form-urlencoded--and it's near cousin multipart/form-data. However, that's messy, we can just call it HTTP, and our implementation is LWP and Apache. At least with J2EE, for every major standard or protocol implemented, there is only one way to do it. With Perl, you actually have more confusion because there are many more ways to do it. More ways to do templating, more ways to do middleware, more ways to do serialization of objects, etc... There are equivalent number of ways in both languages. If you are saying that you could build a standard component in, say, EJB, and sell it, well, that's just not the case. That's the pipe dream of CORBA. The only thing close to portable protocols is HTTP. Sabre, for example, gives you a library, and you have to interface to it. However, authorize.net's interface is HTTP, and I can write my own library in 100 lines of Perl, which matches my application, and doesn't require me to install anything. There's such a thing as standard protocols, but every application uses them differently. Rob
Re: implementing a set of queue-processing servers
Gunther Birznieks writes: If you had an Apache server and a POE app server, what would a cracker have an easier time trying to get in? Assuming up-to-date code, POE, for sure. Probably the Apache server. Once broken through the Apache server, the cracker would have to figure out that it is indeed a POE server on the other end, and then to figure out an exploit by just trying as many things as they can. ie they'd have to do a lot of extra work rather than utilizing a public knowledge exploit someone else discovered. All public knowledge exploits of Apache are fixed within days if not hours. It's the private ones I worry about. There have to be more of these in POE than Apache. The more eyes, the fewer the defects. How? Why would any firewall admin allow SSH access from the outside world to poke progressively inwards through the protected zones? When we want to get to the middle tiers, we go in through the front ends. You need passwords at every level. I'm not sure what you mean here. I think this is correct. But as most servers that are transactions have mod_ssl, I kind of consider mod_ssl and other modules as being fairly core to Apache. They have to be configured to be exploited. Rob
asynchronous execution, was Re: implementing a set of queue-processing servers
At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote: We digress. The problem is to build a UI to Sabre. I still haven't seen any numbers which demonstrate the simple solution doesn't work. Connecting to Sabre is no different than connecting to an e-commerce gateway. Both can be done by connecting directly from the Apache child to the remote service and returning a result. Hi, My question with this approach is not whether it works for synchronous execution (the user is willing to wait for the results to come back) but whether it makes sense for asynchronous execution (the user will come back and get the results later). In fact, we provide our users with the option: 1. fetch the data now and display it, OR 2. put the request in a queue to be fetched and then later displayed We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). Why not let there be web server processes and queue worker processes and they each do their own job? Web servers seem to me to be for synchronous activity, where the user is waiting for the results. Stephen P.S. Another limitation of the use Apache servers for all server processing philosophy seems to be scheduled events or system events (those not initiated by an HTTP request, which are user events). example: Our system allows users to set up a schedule of requests to be run. i.e. Every Tuesday at 3:00am, put this request into the queue. This is a scheduled event rather than a user event. How is a web server process going to wake up and begin processing this? (unless of course everyone who puts something into the queue must send a dummy HTTP request to wake up the web servers)
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Stephen Adkins wrote: So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). It does when you put it like that, but it doesn't have to be that way. I would separate the input (user or queue) from the processing part. You'd have a module that runs in mod_perl which knows how to process requests. You have a separate module which can provide a UI for placing requests. Synchronous ones go straight to processing, while asynch ones get added to the queue. You'd also have a controlling process that polls the queue and if it finds anything it uses LWP to send it to mod_perl for handling. I would make this a tiny script triggered from cron if possible, since cron is robust and can handle outages and error reporting nicely. Why not let there be web server processes and queue worker processes and they each do their own job? Web servers seem to me to be for synchronous activity, where the user is waiting for the results. When I think of queue processing, I think of a system for handling tasks in parallel that provides a simple API for plugging in logic, a well-defined control interface, logging, easy configuration... sounds like Apache to me. You just need a tiny control process to trigger it via LWP. Apache is already a system for handling a queue of HTTP requests in parallel, so you just have to make your requests look like HTTP. You certainly could do this other ways, but you'd probably have to write a lot more code or else use something far less reliable than Apache. P.S. Another limitation of the use Apache servers for all server processing philosophy seems to be scheduled events or system events (those not initiated by an HTTP request, which are user events). Cron/at + LWP. - Perrin
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
Hi Stephen, On Tue, 19 Nov 2002, Stephen Adkins wrote: My question with this approach is not whether it works for synchronous execution (the user is willing to wait for the results to come back) but whether it makes sense for asynchronous execution (the user will come back and get the results later). What kind of interface will you provide to the final users? In fact, we provide our users with the option: 1. fetch the data now and display it, OR 2. put the request in a queue to be fetched and then later displayed We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. So it is possible that an immediate request is queued if the system has already reached its maximum allowed logins. In other words, a final user can request to display data immediately but your middleware can answer that the request has been queued, possibly saying 'your job id, position n, see you later'. Moreover, you must preserve order of requests. And if I recall correctly you talked about a sort of queue listing and some job manipulation. Whatever will be your choice, you undoubtedly need to serialize requests and enqueue them using a dbms. It is the simplest approach. Given a method to add, list or remove requests from this kind of queue, mod_perl (even plain cgi scripts) can use these method to manipulate a user's job. Using access control supplied by Apache, it is possible to give different access rights to users of the middleware. Requests from final users will be always enqueued by an Apache children, that will get a job-id and its position in the queue. If the job is on top of the queue, you will immediately wait for its completion. Otherwise you can tell the user to check his job queue later. Users can remove jobs from the queue. All completed jobs will be stored somewhere (file system and/or db) and can be listed by legitimate users. Jobs completed will show in a separate queue. An external entity will dequeue jobs and process them, probably using something like Parallel::ForkManager to limit concurrent requests. Another entity will enqueue recurring jobs. Jobs scheduled for future processing should always be enqueued immediately, or I can't imagine a coherent interface to remove jobs. These entities look like daemons, that can be spawned and controlled using code executed by Apache. Please note that I never mentioned html, using Apache as your infrastructure you can build whatever interface you need. Requests recorded inside db can also be used to implement a cache, probably reused by following requests. It would be possible to collapse identical requests (to save logins). Obviously it is possible to replace Apache with POE or Stem, but I don't know how, sorry. There are many other solutions, but this sketch describes my way to do it. Sorry for the length of this message. Ciao, Valerio Valerio Paolini, http://130.136.3.200/~paolini -- Linux, the Cheap Chic for Computer Fashionistas
Re: asynchronous execution, was Re: implementing a set of queue-processing servers
On Tue, 2002-11-19 at 16:28, Stephen Adkins wrote: At 08:18 PM 11/18/2002 -0700, Rob Nagler wrote: We digress. The problem is to build a UI to Sabre. I still haven't seen any numbers which demonstrate the simple solution doesn't work. Connecting to Sabre is no different than connecting to an e-commerce gateway. Both can be done by connecting directly from the Apache child to the remote service and returning a result. Hi, My question with this approach is not whether it works for synchronous execution (the user is willing to wait for the results to come back) but whether it makes sense for asynchronous execution (the user will come back and get the results later). In fact, we provide our users with the option: 1. fetch the data now and display it, OR 2. put the request in a queue to be fetched and then later displayed We have a fixed number of mainframe login id's, so we can only run a limited number (say 4) of them at a time. So what I think you are saying for option 2 is: * Apache children (web server processes with mod_perl) have two personalities: - user request processors - back-end work processors * When a user submits work to the queue, the child is acting in a user request role and it returns the response quickly. * After detaching from the user, however, it checks to see if fewer than four children are processing the queue and if so, it logs into the mainframe and starts processing the queue. * When it finishes the request, it continues to work the queue until no more work is available, at which time, it quits its back-end processor personality and returns to wait for another HTTP request. This just seems a bit odd (and unnecessarily complex). Why not let there be web server processes and queue worker processes and they each do their own job? Web servers seem to me to be for synchronous activity, where the user is waiting for the results. I am doing something similar right now in a project. It has to make approx. 220 requests to outside sources in order to compile a completed report. These reports vary in time to create based on the data sources and network traffic. This is the solution I have in place currently: 1) User visits web page (handled by mod_perl) and they make the request for a report. 2) The request parameters are stored into a temp file and the user is redirected to a wait page. The time spent of the wait page varies and an approx time is created based on query complexity. The user session is given a key that is that matches the temp file name. 3) A separate dedicated server (Proc::Daemon based) picks up the temp file and spawns a child to process it. This daemon looks for new temp files every X seconds, were X is 15 seconds, but it could easily be adjusted. It keeps a queue of the temp files that have been processed and drops them from the queue after 45 minutes even if they haven't run. 4) The child recreates the users object and runs the report, when it completes it deletes the temp file. If it fails to complete the temp file remains. 5) When the auto refresh takes place the system determines if the users request has completed by looking for the temp file named in their session data. If the file exists they are given another wait page with a 30 to 120 second wait time. If it doesn't exist then the cached information from the report, just an XML file created from a XML::Simple dump of the hash containing the report data, is processed and presented as HTML to the user. I had attempted using a mod_perl only solution, but I didn't like tying up the server with additional processing that could be handled externally. This method also allows for the server script to reside on a separate machine (allowing for some shared filesystem samba, NFS etc) without having to recreate an entire mod_perl environment. This model has eased my testing as well since I can run the script completely external of the web server I can run it through a debugger if needed. I also use the same script for nightly automated common reports to limit the number of real time requests since the data doesn't can that frequently in my case. Stephen P.S. Another limitation of the use Apache servers for all server processing philosophy seems to be scheduled events or system events (those not initiated by an HTTP request, which are user events). I agree with Perrin, you can use LWP to emulate a users HTTP request if you want to use an HTTP style request. cron/at represents the best way to handle this (IMHO). In my case I run the cron job and it generates the temp files, these temp files get picked up by the looping server (simple non mod_perl daemon) and processed. So I don't use LWP, but could send the request to the server and have it create the temp files just as easily, I just happen to have the logic abstracted to where I don't need to involve the mod_perl. Aaron
Re: implementing a set of queue-processing servers
Rob Nagler wrote: My experience is just the opposite. If you reuse code, most servers contain that code base and are therefore large relative to very specific applications. Most of our mod_perl servers are 15MB minimum, and grow to up to 80MB. But what if the code is not meant to be reused except within the context of what you are processing. But from what you are saying, I suppose I would agree that the integration point with Sabre is pretty much straight through and not much custom logic. The problem is sharing, routing. load balancing, etc. If you run separate processes, there's no chance to share. If you run separate processes, you need more configuration, documentation thereof, and design for peak load becomes more difficult. I think that is a good point. Maybe this could be segmented using a reverse proxy to make the difference between whether it goes to a mod_perl process that talks to Sabre and one that does other app stuff. This stuff is handled on the back end. I don't see where proxies would help. I had mentioned the above in the context of providing an alternative in case there is no backend. But alternatively, splitting it out so that the Perl code that does the logic of talking to Sabre and massaging that for a mod_perl app is stored in POE or PerlRPC, then it would be better to have the 5 middleware processes dealing with the shared memory stuff. POE and PerlRPC are fatter than the Sabre code, I bet. Yeah, I think you are right. I've not used POE, PerlRPC is fairly thin but you are right that it is fatter. So I think this is also a good point. Apache is better than IIS but I would not call it secure in an absolute sense. There have been plenty of exploits for Apache over the last year that give me headaches having to patch ASAP when discovered. Is POE or PerlRPC more secure than Apache? I seriously doubt it. That could be, but how many exploits of POE or PerlRPC or core Perl (which would also be exploitable) has been posted on Bugtraq in the last year. Very few if any (zero I think?). What about Apache? Definitely Not Zero. This doesn't mean that POE is more secure than Apache, but it does mean that there are less publicized exploits. If POE became a popular webserver, certainly more people would be trying to break it actively and perhaps they would find more exploits. So... If you had an Apache server and a POE app server, what would a cracker have an easier time trying to get in? Probably the Apache server. Once broken through the Apache server, the cracker would have to figure out that it is indeed a POE server on the other end, and then to figure out an exploit by just trying as many things as they can. ie they'd have to do a lot of extra work rather than utilizing a public knowledge exploit someone else discovered. If you are a cracker and have hacked someone's Apache, but then your next crack has to find an exploit in a daemon written in Perl like POE before finally getting to the database or backend system, you are still slowing down your attacker. Usually at worst, the attacker will have to figure out something about how POE works. The cracker will go to the OS. They aren't going to proxy-hop. They'll try a telnet, ssh, dns, etc. exploit once they are on the inside. Sure if you run them on the same DMZ or the same server .The assumption is the application server is in a separate zone which only allows requests in from the web server and requests out to the DB server or other resource. If security is a concern, I wouldn't see someone dumping all the code they are running and the database on the same machine cuz then if they get into Apache, they have the keys to the kingdom. I believe more script kiddies/casual crackers can probably log into sybase, oracle, mysql databases and trash them than they can figure out how to talk to an RMI engine, EJB server, SOAP, or POE middleware for an application layer prior to accessing the database. If this is a large scale app, there will be front-ends, middle tiers, and databases. If they crack ssh, they're through the system. If How? Why would any firewall admin allow SSH access from the outside world to poke progressively inwards through the protected zones? they crack Apache, they still have to exploit the specific attack. Middle tiers run mod_perl, but not mod_proxy and mod_ssl. Front ends run mod_proxy and mod_ssl, but not mod_perl. The cracks on Apache have all been on specific modules, not on the Apache core. The weak link is not Apache or the middleware, because the connections as you point out are too complex. I think this is correct. But as most servers that are transactions have mod_ssl, I kind of consider mod_ssl and other modules as being fairly core to Apache. We digress. The problem is to build a UI to Sabre. I still haven't It is a digression, but also an important one. Security of external and
Re: protocol explosion (was asynchronous execution, was Re: implementing a set of queue-processing servers)
Rob Nagler wrote: The antithesis of this is J2EE, which introduces an amazing amount of complexity through protocol explosion (is it a Message/Session/Entity Bean, do I use JMX, JMS, RMI, etc.). It creates tremendous confusion, and their software is certainly less reliable than Apache. I think this is not a fair statement about J2EE (except the less reliable part). In the context of what you are saying, it seems as if everyone should just stick to using TCP/IP/Telnet as a protocol and then the world would be a better place. But I don't think this is so. Everyone ends up creating their own protocols, their own algorithms on top of TCP on how to communicate. In a way it is simpler because you just have the freedom to create whatever you want. But in another way, it is a nightmare because everyone will just implement their own way of doing things. This can be OK in some contexts, but I find it difficult to believe that this is the best thing overall. At least with J2EE, for every major standard or protocol implemented, there is only one way to do it. With Perl, you actually have more confusion because there are many more ways to do it. More ways to do templating, more ways to do middleware, more ways to do serialization of objects, etc...
Re: implementing a set of queue-processing servers
On Sat, 16 Nov 2002, Stephen Adkins wrote: Are you also interested in fault tolerance and accuracy of computation? And what about caching? accuracy of computation? of course. but this would seem to me to be a matter of program logic. caching? not sure what you mean, but caching is good if it increases performance without creating synchronization problems. Your last messages clarified your situation and needs, so I think caching is not interesting. Btw, what you described is covered in some architectural design patterns that I was studying when your question arrived in my mailbox. Ciao, Valerio Valerio Paolini, http://130.136.3.200/~paolini -- Linux, the Cheap Chic for Computer Fashionistas
Re: implementing a set of queue-processing servers
Rob Nagler wrote: Gunther Birznieks writes: Also, I suspect it probably wouldn't be efficient memory wise. mod_perl processes are large enough with front-end code without randomly having them share a bunch of middleware/mainframe processing code also. This middleware code could probably be more tightly shared amongst a smaller number of processes that just service the mainframe stuff. The sharing should be identical, esp. if the code is written in C or C++ which most middleware (and probably Sabre) is written in. 1) Likely I would think there will be massaging of data relative to the application at hand, so it wouldn't be a pure C wrapper 2) Data slowly corrupts Code in Perl, so shared memory breaks down after awhile. For these two reasons, it seems to me, that you won't really get that much shared memory so it would still be better to limit the code to as few engines as possible instead of the universe of engines that the application can access. so if 30 mod_perl engines are needed for the application, but only 5 at any given time are accessing the reservation system, then only 5 engines should be going to the reservation system. Maybe this could be segmented using a reverse proxy to make the difference between whether it goes to a mod_perl process that talks to Sabre and one that does other app stuff. But alternatively, splitting it out so that the Perl code that does the logic of talking to Sabre and massaging that for a mod_perl app is stored in POE or PerlRPC, then it would be better to have the 5 middleware processes dealing with the shared memory stuff. In addition, I would advocate middleware prior to talking to a mainframe because of security. You can have someone break into the web server but if it is hooked direct to the mainframe, then that person can hop directly onto the mainframe. Instead, the requests could be mediated and well-formed by the middleware. The cracker would have to hack the middleware after hacking the web server in order to get to the mainframe if you add a layer like this. Of course, maybe this is an Intranet application, so such things may not matter... Security is always a concern, which is why Apache is a much better solution. It's much like buying security from a company which builds public ATMs and buying security from companies which build corporate laptop security systems. The former is like Apache, the later is like most (if not all) middleware. Apache has proven the test time, because it is being attacked *continuously*. This is why Apache is so much more secure than IIS, which had a much later start and wasn't used for large sites. Apache is better than IIS but I would not call it secure in an absolute sense. There have been plenty of exploits for Apache over the last year that give me headaches having to patch ASAP when discovered. If you are a cracker and have hacked someone's Apache, but then your next crack has to find an exploit in a daemon written in Perl like POE before finally getting to the database or backend system, you are still slowing down your attacker. Usually at worst, the attacker will have to figure out something about how POE works. I believe more script kiddies/casual crackers can probably log into sybase, oracle, mysql databases and trash them than they can figure out how to talk to an RMI engine, EJB server, SOAP, or POE middleware for an application layer prior to accessing the database. Later, Gunther
Re: implementing a set of queue-processing servers
On Fri, Nov 15, 2002 at 03:53:53PM -0500, Stephen Adkins wrote: At 02:09 PM 11/15/2002 -0500, Rocco Caputo wrote: On Fri, Nov 15, 2002 at 11:45:33AM -0500, Stephen Adkins wrote: QUESTIONS: * What queue mechanism would you use, assuming all of the writers and readers are on the same system? (IPC::Msg? MsgQ?) If speed is a major factor, I would use a FIFO (named pipe). This is a very lightweight and fast way to pass data between processes on the same machine. Are FIFO's (named pipes) on Unix guaranteed to maintain the integrity of the messages in the case of multiple writers? I think you could guarantee this if you imposed restrictions on the data travelling through the pipe: i.e. single text line, must be written in a single (unbuffered) write() system call. Otherwise, doesn't a FIFO break down as a message queue when you have multiple writers with arbitrarily long message data? According to _Advanced Programming in the UNIX Environment_, the largest atomic write is PIPE_BUF bytes. On FreeBSD, /usr/include/limits.h defines PIPE_BUF as 512 bytes. APUE also says: Indeed, the normal file I/O functions (close, read, write, unlink, etc.) all work with FIFOs. This leads me to believe that flock() could protect the integrity of large FIFO writes. I've never had the occasion to need it and can't say for sure. -- Rocco Caputo - [EMAIL PROTECTED] - http://poe.perl.org/
Re: implementing a set of queue-processing servers
Stephen Adkins writes: However, I have been thinking about asynchronous execution, queues, and queue-working, and I wanted to get a handle on how best I should solve the problem in a general way. I guess this is where we diverge. It sounds like you have a specific problem. Generalizing at this stage is going to be a mistake imiho. Rob
Re: implementing a set of queue-processing servers
At 02:09 PM 11/15/2002 -0500, Rocco Caputo wrote: On Fri, Nov 15, 2002 at 11:45:33AM -0500, Stephen Adkins wrote: QUESTIONS: * What queue mechanism would you use, assuming all of the writers and readers are on the same system? (IPC::Msg? MsgQ?) If speed is a major factor, I would use a FIFO (named pipe). This is a very lightweight and fast way to pass data between processes on the same machine. Are FIFO's (named pipes) on Unix guaranteed to maintain the integrity of the messages in the case of multiple writers? I think you could guarantee this if you imposed restrictions on the data travelling through the pipe: i.e. single text line, must be written in a single (unbuffered) write() system call. Otherwise, doesn't a FIFO break down as a message queue when you have multiple writers with arbitrarily long message data? * How about if the queue writers were distributed, but the queue readers were all on one machine? (RPC to insert into the above-mentioned local queues?) * How about if the queue writers and queue readers were all distributed around the network? (Spread::Queue::FIFO? Parallel::PVM? Parallel::MPI? MQSeries::Queue?) Your requirement #2 seems to indicate that the queue is held in a database table. In that case the queue is inherently distributable. Each machine makes its own connections to the database and processes tasks in the queue using whatever locking is necessary. Yes. In this case, you are right. The only thing that's missing is the wakeup to the servers so that they do not need to poll. This requires queue workers to poll the database for new jobs, which you later state is something you're trying to avoid. MY HUNCHES I think I'll use IPC::Msg as the queue because the queue readers will all be on one machine. I'll also have to implement a simple RPC server (using Net::Server) to perform remote insertions into the local queue. If this seems too rough, I'll probably install the Spread Toolkit and use Spread::Queue. I currently think I'll keep working with Net::Server to see if I can use it to process a queue rather than listen on a network port, but I'm not sure that this is the right use of the module. I may end up ditching this effort and just have a set of parallel servers all waiting on the queue. The queue mechanism itself will work out who gets to work on which request. Any input? Depending on how critical your transactions are, it may be more reliable to use the database as the queue. Jobs passed through it are saved to persistent storage, making them more likely to survive a crash. Do you need to roll forward unprocessed tasks if you must restart the server? Crash resistance is an important consideration for queues and queue workers in general. In this case, because it is primarily a read-only decision support system, if we had a system crash, the loss of requests in the queue would be the least of our worries. If you use the database as the queue, the message passing between clients and servers amounts to little more than a wake-up call: Hey, you've got task! You are right. That is in fact all my queue needs to do is to say Hey, you've got a task in order to eliminate polling when there is no work to do and to wake up the server immediately when there is work to do. I might almost use a signal. I would just need to IGNORE the signal while the server is running and reset the signal handler when the server is about to go back to sleep. However, I have been thinking about asynchronous execution, queues, and queue-working, and I wanted to get a handle on how best I should solve the problem in a general way. Stephen
Re: implementing a set of queue-processing servers
At 02:09 PM 11/15/2002 -0500, Rocco Caputo wrote: On Fri, Nov 15, 2002 at 11:45:33AM -0500, Stephen Adkins wrote: QUESTIONS: * What queue mechanism would you use, assuming all of the writers and readers are on the same system? (IPC::Msg? MsgQ?) If speed is a major factor, I would use a FIFO (named pipe). This is a very lightweight and fast way to pass data between processes on the same machine. Are FIFO's (named pipes) on Unix guaranteed to maintain the integrity of the messages in the case of multiple writers? I think you could guarantee this if you imposed restrictions on the data travelling through the pipe: i.e. single text line, must be written in a single (unbuffered) write() system call. Otherwise, doesn't a FIFO break down as a message queue when you have multiple writers with arbitrarily long message data? * How about if the queue writers were distributed, but the queue readers were all on one machine? (RPC to insert into the above-mentioned local queues?) * How about if the queue writers and queue readers were all distributed around the network? (Spread::Queue::FIFO? Parallel::PVM? Parallel::MPI? MQSeries::Queue?) Your requirement #2 seems to indicate that the queue is held in a database table. In that case the queue is inherently distributable. Each machine makes its own connections to the database and processes tasks in the queue using whatever locking is necessary. Yes. In this case, you are right. The only thing that's missing is the wakeup to the servers so that they do not need to poll. This requires queue workers to poll the database for new jobs, which you later state is something you're trying to avoid. MY HUNCHES I think I'll use IPC::Msg as the queue because the queue readers will all be on one machine. I'll also have to implement a simple RPC server (using Net::Server) to perform remote insertions into the local queue. If this seems too rough, I'll probably install the Spread Toolkit and use Spread::Queue. I currently think I'll keep working with Net::Server to see if I can use it to process a queue rather than listen on a network port, but I'm not sure that this is the right use of the module. I may end up ditching this effort and just have a set of parallel servers all waiting on the queue. The queue mechanism itself will work out who gets to work on which request. Any input? Depending on how critical your transactions are, it may be more reliable to use the database as the queue. Jobs passed through it are saved to persistent storage, making them more likely to survive a crash. Do you need to roll forward unprocessed tasks if you must restart the server? Crash resistance is an important consideration for queues and queue workers in general. In this case, because it is primarily a read-only decision support system, if we had a system crash, the loss of requests in the queue would be the least of our worries. If you use the database as the queue, the message passing between clients and servers amounts to little more than a wake-up call: Hey, you've got task! You are right. That is in fact all my queue needs to do is to say Hey, you've got a task in order to eliminate polling when there is no work to do and to wake up the server immediately when there is work to do. I might almost use a signal. I would just need to IGNORE the signal while the server is running and reset the signal handler when the server is about to go back to sleep. However, I have been thinking about asynchronous execution, queues, and queue-working, and I wanted to get a handle on how best I should solve the problem in a general way. Stephen
Re: implementing a set of queue-processing servers
At 08:59 PM 11/15/2002 +0100, Valerio_Valdez Paolini wrote: On Fri, 15 Nov 2002, Stephen Adkins wrote: You seem to advocate Apache/mod_perl for end-user (returning HTML) and server-to-server (RPC) use. That makes sense. But it doesn't seem to make sense for my family of servers that spend all of their time waiting for the mainframe to return their next transaction. Are you also interested in fault tolerance and accuracy of computation? And what about caching? In my case, we have enough single points of failure in the system that fault tolerance for this component is not critical. However, I am interested in general in the solution to the [asynchronous execution + queue + queue worker] problem along with all of the issues related to scalability and reliability. accuracy of computation? of course. but this would seem to me to be a matter of program logic. caching? not sure what you mean, but caching is good if it increases performance without creating synchronization problems. Stephen
implementing a set of queue-processing servers
Hi, I have the following requirement, and I am seeking your input. 1. web-based users make requests for data which are put in a queue 2. these requests and their status need to be in a database so that users can watch the status of the queue and their requests in the queue 3. a set of servers process the requests for data from the queue and put the results in a results table so that users can view their data when their requests are done QUESTIONS: * What queue mechanism would you use, assuming all of the writers and readers are on the same system? (IPC::Msg? MsgQ?) * How about if the queue writers were distributed, but the queue readers were all on one machine? (RPC to insert into the above-mentioned local queues?) * How about if the queue writers and queue readers were all distributed around the network? (Spread::Queue::FIFO? Parallel::PVM? Parallel::MPI? MQSeries::Queue?) * What Perl server-building software on CPAN do you recommend, and why? (Net::Server? Net::Daemon? POE?) I started working with Net::Server, but it seems focused on being a network request multi-server, not a queue working multi-server. * Would you implement it as many peer-level servers waiting on a single queue? or a single parent server waiting on the queue, dispatching queued work units to waiting child servers? QUICK AND DIRTY SINGLE-SERVER SOLUTION I implemented a quick-and-dirty single-server solution, where I use a single server to process requests. I simply poll the request table in the database once a minute for new requests, and if they exist, I process them. Now I am looking to upgrade this for higher throughput (multiple parallel servers), lower background load (no polling during quiet periods), and lower latency (immediate response to queue insertion rather than waiting for the next poll interval). MY HUNCHES I think I'll use IPC::Msg as the queue because the queue readers will all be on one machine. I'll also have to implement a simple RPC server (using Net::Server) to perform remote insertions into the local queue. If this seems too rough, I'll probably install the Spread Toolkit and use Spread::Queue. I currently think I'll keep working with Net::Server to see if I can use it to process a queue rather than listen on a network port, but I'm not sure that this is the right use of the module. I may end up ditching this effort and just have a set of parallel servers all waiting on the queue. The queue mechanism itself will work out who gets to work on which request. Any input? Stephen
Re: implementing a set of queue-processing servers
Stephen Adkins writes: QUICK AND DIRTY SINGLE-SERVER SOLUTION I implemented a quick-and-dirty single-server solution, where I use a single server to process requests. I simply poll the request table in the database once a minute for new requests, and if they exist, I process them. Now I am looking to upgrade this for higher throughput (multiple parallel servers), lower background load (no polling during quiet periods), and lower latency (immediate response to queue insertion rather than waiting for the next poll interval). I like this solution. Are you finding performance problems? One thing is to execute process_queue right after doing the insert. Remember that databases are great at assuring atomicity and persistency. Apache/mod_perl is the best all around application server available. What's simpler than LWP::Request with an URL and a return of a Perl (or XML if it's another language) data structure? Run eval (or XML::Parser) and off you go. You can wrap this, but how many message types do you have? Rob
Re: implementing a set of queue-processing servers
Stephen Adkins writes: The server(s) connect to a mainframe and perform time-consuming, repetitive transactions to collect the data that has been requested. Thus, these servers are slow, waiting several seconds for each response, but they do not put a large load on the local processor. So I want many of them running in parallel. Makes sense. Are you proposing that I use Apache/mod_perl child processes to do the transactions to the mainframe? That doesn't seem right. They are then no available to listen for HTTP requests, which is the whole purpose of an apache child process. That's the point. By using the Apache/mod_perl processes for all work, you can easily designed for peak load. It's all work. You can serve HTTP requests if your machine is overloaded doing work of other sorts. We do this for e-commerce, web scraping, mail handling, etc. Everything goes through Apache/mod_perl. No fuss, no muss. You seem to advocate Apache/mod_perl for end-user (returning HTML) and server-to-server (RPC) use. That makes sense. But it doesn't seem to make sense for my family of servers that spend all of their time waiting for the mainframe to return their next transaction. Can you do asynchronous I/O? You'll be a lot more efficient memory and CPU-wise if you send a series of messages and wait for the results to come in. Consuming a Unix/Mainframe process slot (or even a thread) for something like this is very inefficient. I worked on a CORBA-based Web server for Tandem, which didn't use threads. Instead the servers would do asynchronous I/O to the resources they were responsible for. I built the CGI component, which on Tandem was a gateway to Tandem's transaction monitor, Pathway. All CGI processes were managed by a single process which accepted requests via CORBA and fired off messages to Pathway. When Pathway would respond, the CORBA response would be sent. Replace CORBA with HTTP, and you have a simpler, more efficient solution. One other trick you might try is simply hanging onto the HTTP request until all the jobs for a particular user finish. If you have, say 50 jobs, and they run in parallel, they might get done before 30 seconds which is short enough for a person to way and that way you don't deal with the whole database/polling/garbage collection piece. Rob