stas 2002/07/31 07:43:17 Added: src/docs/general/hardware hardware.pod src/docs/general/multiuser multiuser.pod src/docs/general/perl_myth perl_myth.pod src/docs/general/perl_reference perl_reference.pod Log: give pods their own dirs Revision Changes Path 1.1 modperl-docs/src/docs/general/hardware/hardware.pod Index: hardware.pod =================================================================== =head1 NAME Choosing an Operating System and Hardware =head1 Description Before you use the techniques documented on this site to tune servers and write code you need to consider the demands which will be placed on the hardware and the operating system. There is no point in investing a lot of time and money in configuration and coding only to find that your server's performance is poor because you did not choose a suitable platform in the first place. While the tips below could apply to many web servers, they are aimed primarily at administrators of mod_perl enabled Apache server. Because hardware platforms and operating systems are developing rapidly (even while you are reading this document), this discussion must be in general terms. =head1 Choosing an Operating System First let's talk about Operating Systems (OSs). Most of the time I prefer to use Linux or something from the *BSD family. Although I am personally a Linux devotee, I do not want to start yet another OS war. I will try to talk about what characteristics and features you should be looking for to support an Apache/mod_perl server, then when you know what you want from your OS, you can go out and find it. Visit the Web sites of the operating systems you are interested in. You can gauge user's opinions by searching the relevant discussions in newsgroups and mailing list archives. Deja - http://deja.com and eGroups - http://egroups.com are good examples. I will leave this fan research to the reader. =head2 Stability and Robustness Probably the most important features in an OS are stability and robustness. You are in an Internet business. You do not keep normal 9am to 5pm working hours like many conventional businesses you know. You are open 24 hours a day. You cannot afford to be off-line, for your customers will go shop at another service like yours (unless you have a monopoly :). If the OS of your choice crashes every day, first do a little investigation. There might be a simple reason which you can find and fix. There are OSs which won't work unless you reboot them twice a day. You don't want to use the OS of this kind, no matter how good the OS' vendor sales department. Do not follow flushy advertisements, follow developers advices instead. Generally, people who have used the OS for some time can tell you a lot about its stability. Ask them. Try to find people who are doing similar things to what you are planning to do, they may even be using the same software. There are often compatibility issues to resolve. You may need to become familiar with patching and compiling your OS. It's easy. =head2 Memory Management You want an OS with a good memory management, some OSs are well known as memory hogs. The same code can use twice as much memory on one OS compared to another. If the size of the mod_perl process is 10Mb and you have tens of these running, it definitely adds up! =head2 Memory Leaks Some OSs and/or their libraries (e.g. C runtime libraries) suffer from memory leaks. A leak is when some process requests a chunk of memory for temporary storage, but then does not subsequently release it. The chunk of memory is not then available for any purpose until the process which requested it dies. We cannot afford such leaks. A single mod_perl process sometimes serves thousands of requests before it terminates. So if a leak occurs on every request, the memory demands could become huge. Of course our code can be the cause of the memory leaks as well (check out the C<Apache::Leak> module on CPAN). Certainly, we can reduce the number of requests to be served over the process' life, but that can degrade performance. =head2 Sharing Memory We want an OS with good memory sharing capabilities. As we have seen, if we preload the modules and scripts at server startup, they are shared between the spawned children (at least for a part of a process' life - memory pages can become "dirty" and cease to be shared). This feature can reduce memory consumption a lot! =head2 Cost and Support If we are in a big business we probably do not mind paying another $1000 for some fancy OS with bundled support. But if our resources are low, we will look for cheaper and free OSs. Free does not mean bad, it can be quite the opposite. Free OSs can have the best support we can find. Some do. It is very easy to understand - most of the people are not rich and will try to use a cheaper or free OS first if it does the work for them. Since it really fits their needs, many people keep using it and eventually know it well enough to be able to provide support for others in trouble. Why would they do this for free? One reason is for the spirit of the first days of the Internet, when there was no commercial Internet and people helped each other, because someone helped them in first place. I was there, I was touched by that spirit and I am keen to keep that spirit alive. But, let's get back to our world. We are living in material world, and our bosses pay us to keep the systems running. So if you feel that you cannot provide the support yourself and you do not trust the available free resources, you must pay for an OS backed by a company, and blame them for any problem. Your boss wants to be able to sue someone if the project has a problem caused by the external product that is being used in the project. If you buy a product and the company selling it claims support, you have someone to sue or at least to put the blame on. If we go with Open Source and it fails we do not have someone to sue... wrong--in the last years many companies have realized how good the Open Source products are and started to provide an official support for these products. So your boss cannot just dismiss your suggestion of using an Open Source Operating System. You can get a paid support just like with any other commercial OS vendor. Also remember that the less money you spend on OS and Software, the more you will be able to spend on faster and stronger hardware. =head2 Discontinued Products The OSs in this hazard group tend to be developed by a single company or organization. You might find yourself in a position where you have invested a lot of time and money into developing some proprietary software that is bundled with the OS you chose (say writing a mod_perl handler which takes advantage of some proprietary features of the OS and which will not run on any other OS). Things are under control, the performance is great and you sing with happiness on your way to work. Then, one day, the company which supplies your beloved OS goes bankrupt (not unlikely nowadays), or they produce a newer incompatible version and they will not support the old one (happens all the time). You are stuck with their early masterpiece, no support and no source code! What are you going to do? Invest more money into porting the software to another OS... Everyone can be hit by this mini-disaster so it is better to check the background of the company when making your choice. Even so you never know what will happen tomorrow - in 1980, a company called Tektronix did something similar to one of the Guide reviewers with its microprocessor development system. The guy just had to buy another system. He didn't buy it from Tektronix, of course. The second system never really worked very well and the firm he bought it from went bust before they ever got around to fixing it. So in 1982 he wrote his own microprocessor development system software. It didn't take long, it works fine, and he's still using it 18 years later. Free and Open Source OSs are probably less susceptible to this kind of problem. Development is usually distributed between many companies and developers, so if a person who developed a really important part of the kernel lost interest in continuing, someone else will pick the falling flag and carry on. Of course if tomorrow some better project shows up, developers might migrate there and finally drop the development: but in practice people are often given support on older versions and helped to migrate to current versions. Development tends to be more incremental than revolutionary, so upgrades are less traumatic, and there is usually plenty of notice of the forthcoming changes so that you have time to plan for them. Of course with the Open Source OSs you can have the source! So you can always have a go yourself, but do not under-estimate the amounts of work involved. There are many, many man-years of work in an OS. =head2 OS Releases Actively developed OSs generally try to keep pace with the latest technology developments, and continually optimize the kernel and other parts of the OS to become better and faster. Nowadays, Internet and networking in general are the hottest topics for system developers. Sometimes a simple OS upgrade to the latest stable version can save you an expensive hardware upgrade. Also, remember that when you buy new hardware, chances are that the latest software will make the most of it. If a new product supports an old one by virtue of backwards compatibility with previous products of the same family, you might not reap all the benefits of the new product's features. Perhaps you get almost the same functionality for much less money if you were to buy an older model of the same product. =head1 Choosing Hardware Sometimes the most expensive machine is not the one which provides the best performance. Your demands on the platform hardware are based on many aspects and affect many components. Let's discuss some of them. In the discussion we use terms that may be unfamiliar to some readers: =over 4 =item * Cluster - a group of machines connected together to perform one big or many small computational tasks in a reasonable time. Clustering can also be used to provide 'fail-over' where if one machine fails its processes are transferred to another without interruption of service. And you may be able to take one of the machines down for maintenance (or an upgrade) and keep your service running - the main server will simply not dispatch the requests to the machine that was taken down. =item * Load balancing - users are given the name of one of your machines but perhaps it cannot stand the heavy load. You can use a clustering approach to distribute the load over a number of machines. The central server, which users access initially when they type the name of your service, works as a dispatcher. It just redirects requests to other machines. Sometimes the central server also collects the results and returns them to the users. You can get the advantages of clustering too. There are many load balancing techniques. (See L<High-Availability Linux Project|download::third_party/High_Availability_Linux_Project> for more info.) =item * NIC - Network Interface Card. A hardware component that allows to connect your machine to the network. It performs packets sending and receiving, newer cards can encrypt and decrypt packets and perform digital signing and verifying of the such. These are coming in different speeds categories varying from 10Mbps to 10Gbps and faster. The most used type of the NIC card is the one that implements the Ethernet networking protocol. =item * RAM - Random Access Memory. It's the memory that you have in your computer. (Comes in units of 8Mb, 16Mb, 64Mb, 256Mb, etc.) =item * RAID - Redundant Array of Inexpensive Disks. An array of physical disks, usually treated by the operating system as one single disk, and often forced to appear that way by the hardware. The reason for using RAID is often simply to achieve a high data transfer rate, but it may also be to get adequate disk capacity or high reliability. Redundancy means that the system is capable of continued operation even if a disk fails. There are various types of RAID array and several different approaches to implementing them. Some systems provide protection against failure of more than one drive and some (`hot-swappable') systems allow a drive to be replaced without even stopping the OS. See for example the Linux `HOWTO' documents Disk-HOWTO, Module-HOWTO and Parallel-Processing-HOWTO. =back =head2 Machine Strength Demands According to Expected Site Traffic If you are building a fan site and you want to amaze your friends with a mod_perl guest book, any old 486 machine could do it. If you are in a serious business, it is very important to build a scalable server. If your service is successful and becomes popular, the traffic could double every few days, and you should be ready to add more resources to keep up with the demand. While we can define the webserver scalability more precisely, the important thing is to make sure that you can add more power to your webserver(s) without investing much additional money in software development (you will need a little software effort to connect your servers, if you add more of them). This means that you should choose hardware and OSs that can talk to other machines and become a part of a cluster. On the other hand if you prepare for a lot of traffic and buy a monster to do the work for you, what happens if your service doesn't prove to be as successful as you thought it would be? Then you've spent too much money, and meanwhile faster processors and other hardware components have been released, so you lose. Wisdom and prophecy, that's all it takes :) =head3 Single Strong Machine vs Many Weaker Machines Let's start with a claim that a four years old processor is still very powerful and can be put to a good use. Now let's say that for a given amount of money you can probably buy either one new very strong machine or about ten older but very cheap machines. I claim that with ten old machines connected into a cluster and by deploying load balancing you will be able to serve about five times more requests than with one single new machine. Why is that? Because generally the performance improvement on a new machine is marginal while the price is much higher. Ten machines will do faster disk I/O than one single machine, even if the new disk is quite a bit faster. Yes, you have more administration overhead, but there is a chance you will have it anyway, for in a short time the new machine you have just bought might not stand the load. Then you will have to purchase more equipment and think about how to implement load balancing and web server file system distribution anyway. Why I'm so convinced? Look at the busiest services on the Internet: search engines, web-email servers and the like -- most of them use a clustering approach. You may not always notice it, because they hide the real implementation behind proxy servers. =head2 Internet Connection You have the best hardware you can get, but the service is still crawling. Make sure you have a fast Internet connection. Not as fast as your ISP claims it to be, but fast as it should be. The ISP might have a very good connection to the Internet, but put many clients on the same line. If these are heavy clients, your traffic will have to share the same line and your throughput will suffer. Think about a dedicated connection and make sure it is truly dedicated. Don't trust the ISP, check it! The idea of having a connection to B<The Internet> is a little misleading. Many Web hosting and co-location companies have large amounts of bandwidth, but still have poor connectivity. The public exchanges, such as MAE-East and MAE-West, frequently become overloaded, yet many ISPs depend on these exchanges. Private peering means that providers can exchange traffic much quicker. Also, if your Web site is of global interest, check that the ISP has good global connectivity. If the Web site is going to be visited mostly by people in a certain country or region, your server should probably be located there. Bad connectivity can directly influence your machine's performance. Here is a story one of the developers told on the mod_perl mailing list: What relationship has 10% packet loss on one upstream provider got to do with machine memory ? Yes.. a lot. For a nightmare week, the box was located downstream of a provider who was struggling with some serious bandwidth problems of his own... people were connecting to the site via this link, and packet loss was such that retransmits and tcp stalls were keeping httpd heavies around for much longer than normal.. instead of blasting out the data at high or even modem speeds, they would be stuck at 1k/sec or stalled out... people would press stop and refresh, httpds would take 300 seconds to timeout on writes to no-one.. it was a nightmare. Those problems didn't go away till I moved the box to a place closer to some decent backbones. Note that with a proxy, this only keeps a lightweight httpd tied up, assuming the page is small enough to fit in the buffers. If you are a busy internet site you always have some slow clients. This is a difficult thing to simulate in benchmark testing, though. =head2 I/O Performance If your service is I/O bound (does a lot of read/write operations to disk) you need a very fast disk, especially if the you need a relational database, which are the main I/O stream creators. So you should not spend the money on Video card and monitor! A cheap card and a 14" monochrome monitor are perfectly adequate for a Web server, you will probably access it by C<telnet> or C<ssh> most of the time. Look for disks with the best price/performance ratio. Of course, ask around and avoid disks that have a reputation for headcrashes and other disasters. You must think about RAID or similar systems if you have an enormous data set to serve (what is an enormous data set nowadays? Gigabytes, Terabytes?) or you expect a really big web traffic. Ok, you have a fast disk, what's next? You need a fast disk controller. There may be one embedded on your computer's motherboard. If the controller is not fast enough you should buy a faster one. Don't forget that it may be necessary to disable the original controller. =head2 Memory Memory should be well tested. Many memory test programs are practically useless. Running a busy system for a few weeks without ever shutting it down is a pretty good memory test. If you increase the amount of RAM on a well-tested box, use well-tested RAM. How much RAM do you need? Nowadays, the chances are that you will hear: "Memory is cheap, the more you buy the better". But how much is enough? The answer is pretty straightforward: I<you do not want your machine to swap>. When the CPU needs to write something into memory, but memory is already full, it takes the least frequently used memory pages and swaps them out to disk. This means you have to bear the time penalty of writing the data to disk. If another process then references some of the data which happens to be on one of the pages that has just been swapped out, the CPU swaps it back in again, probably swapping out some other data that will be needed very shortly by some other process. Carried to the extreme, the CPU and disk start to I<thrash> hopelessly in circles, without getting any real work done. The less RAM there is, the more often this scenario arises. Worse, you can exhaust swap space as well, and then your troubles really start... How do you make a decision? You know the highest rate at which your server expects to serve pages and how long it takes on average to serve one. Now you can calculate how many server processes you need. If you know the maximum size your servers can grow to, you know how much memory you need. If your OS supports L<memory sharing|general::hardware::hardware/Sharing_Memory>, you can make best use of this feature by preloading the modules and scripts at server startup, and so you will need less memory than you have calculated. Do not forget that other essential system processes need memory as well, so you should plan not only for the Web server, but also take into account the other players. Remember that requests can be queued, so you can afford to let your client wait for a few moments until a server is available to serve it. Most of the time your server will not have the maximum load, but you should be ready to bear the peaks. You need to reserve at least 20% of free memory for peak situations. Many sites have crashed a few moments after a big scoop about them was posted and an unexpected number of requests suddenly came in. (This is called the Slashdot effect, which was born at http://slashdot.org ). If you are about to announce something cool, be aware of the possible consequences. =head2 CPU Make sure that the CPU is operating within its specifications. Many boxes are shipped with incorrect settings for CPU clock speed, power supply voltage etc. Sometimes a cooling fan is not fitted. It may be ineffective because a cable assembly fouls the fan blades. Like faulty RAM, an overheating processor can cause all kinds of strange and unpredictable things to happen. Some CPUs are known to have bugs which can be serious in certain circumstances. Try not to get one of them. =head2 Bottlenecks You might use the most expensive components, but still get bad performance. Why? Let me introduce an annoying word: bottleneck. A machine is an aggregate of many components. Almost any one of them may become a bottleneck. If you have a fast processor but a small amount of RAM, the RAM will probably be the bottleneck. The processor will be under-utilized, usually it will be waiting for the kernel to swap the memory pages in and out, because memory is too small to hold the busiest pages. If you have a lot of memory, a fast processor, a fast disk, but a slow disk controller, the disk controller will be the bottleneck. The performance will still be bad, and you will have wasted money. Use a fast NIC that does not create a bottleneck. They are cheap. If the NIC is slow, the whole service is slow. This is a most important component, since webservers are much more often network-bound than they are disk-bound! =head3 Solving Hardware Requirement Conflicts It may happen that the combination of software components which you find yourself using gives rise to conflicting requirements for the optimization of tuning parameters. If you can separate the components onto different machines you may find that this approach (a kind of clustering) solves the problem, at much less cost than buying faster hardware, because you can tune the machines individually to suit the tasks they should perform. For example if you need to run a relational database engine and mod_perl server, it can be wise to put the two on different machines, since while RDBMS need a very fast disk, mod_perl processes need lots of memory. So by placing the two on different machines it's easy to optimize each machine at separate and satisfy the each software components requirements in the best way. =head2 Conclusion To use your money optimally you have to understand the hardware very well, so you will know what to pick. Otherwise, you should hire a knowledgeable hardware consultant and employ them on a regular basis, since your needs will probably change as time goes by and your hardware will likewise be forced to adapt as well. =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.1 modperl-docs/src/docs/general/multiuser/multiuser.pod Index: multiuser.pod =================================================================== =head1 NAME mod_perl for ISPs. mod_perl and Virtual Hosts =head1 Description mod_perl hosting by ISPs: fantasy or reality? This section covers some topics that might be of interest to users looking for ISPs to host their mod_perl-based website, and ISPs looking for a way to provide such services. Today, it is a reality: there are a number of ISPs hosting mod_perl, although the number of these is not as big as we would have liked it to be. To see a list of ISPs that can provide mod_perl hosting, see L<ISPs supporting mod_perl|help::isps>. =head1 ISPs providing mod_perl services - a fantasy or a reality =over 4 =item * You installed mod_perl on your box at home, and you fell in love with it. So now you want to convert your CGI scripts (which currently are running on your favorite ISPs machine) to run under mod_perl. Then you discover that your ISP has never heard of mod_perl, or he refuses to install it for you. =item * You are an old sailor in the ISP business, you have seen it all, you know how many ISPs are out there and you know that the sales margins are too low to keep you happy. You are looking for some new service almost no one else provides, to attract more clients to become your users and hopefully to have a bigger slice of the action than your competitors. =back If you are a user asking for a mod_perl service or an ISP considering to provide this service, this section should make things clear for both of you. An ISP has three choices: =over 4 =item 1 ISPs probably cannot let users run scripts under mod_perl on the main server. There are many reasons for this: Scripts might leak memory, due to sloppy programming. There will not be enough memory to run as many servers as required, and clients will be not satisfied with the service because it will be slower. The question of file permissions is a very important issue: any user who is allowed to write and run a CGI script can at least read (if not write) any other files that belong to the same user and/or group the web server is running as. Note that L<it's impossible to run C<suEXEC> and C<cgiwrap> extensions under mod_perl 1.0|guide::install/Is_it_possible_to_run_mod_perl_enabled_Apache_as_suExec_>. Another issue is the security of the database connections. If you use C<Apache::DBI>, by hacking the C<Apache::DBI> code you can pick a connection from the pool of cached connections even if it was opened by someone else and your scripts are running on the same web server. Yet another security issue is a potential compromise of the systems via user's code running on the webservers. One of the possible solutions here is to use chroot(1) or jail(8) mechanisms which allow to run subsystems isolated from the main system. So if a subsystem gets compromised the whole system is still safe. There are many more things to be aware of so at this time you have to say I<No>. Of course as an ISP you can run mod_perl internally, without allowing your users to map their scripts so that they will run under mod_perl. If as a part of your service you provide scripts such as guest books, counters etc. which are not available for user modification, you can still can have these scripts running very fast. =item 2 But, hey why can't I let my users run their own servers, so I can wash my hands of them and don't have to worry about how dirty and sloppy their code is (assuming that the users are running their servers under their own usernames, to prevent them from stealing code and data from each other). This option is fine as long as you are not concerned about your new systems resource requirements. If you have even very limited experience with mod_perl, you know that mod_perl enabled Apache servers while freeing up your CPU and allowing you to run scripts very much faster, have huge memory demands (5-20 times that of plain Apache). The size depends on the code length, the sloppiness of the programming, possible memory leaks the code might have and all that multiplied by the number of children each server spawns. A very simple example: a server, serving an average number of scripts, demanding 10Mb of memory which spawns 10 children, already raises your memory requirements by 100Mb (the real requirement is actually much smaller if your OS allows code sharing between processes and programmers exploit these features in their code). Now multiply the average required size by the number of server users you intend to have and you will get the total memory requirement. Since ISPs never say I<No>, you'd better take the inverse approach - think of the largest memory size you can afford then divide it by one user's requirements as I have shown in this example, and you will know how many mod_perl users you can afford :) But you cannot tell how much memory your users may use? Their requirements from a single server can be very modest, but do you know how many servers they will run? After all, they have full control of I<httpd.conf> - and it has to be this way, since this is essential for the user running mod_perl. All this rumbling about memory leads to a single question: is it possible to prevent users from using more than X memory? Or another variation of the question: assuming you have as much memory as you want, can you charge users for their average memory usage? If the answer to either of the above questions is I<Yes>, you are all set and your clients will prize your name for letting them run mod_perl! There are tools to restrict resource usage (see for example the man pages for C<ulimit(3)>, C<getrlimit(2)>, C<setrlimit(2)> and C<sysconf(3)>, the last three have the corresponding Perl modules: C<BSD::Resource> and C<Apache::Resource>). [ReaderMETA]: If you have experience with other resource limiting techniques please share it with us. Thank you! If you have chosen this option, you have to provide your client with: =over 4 =item * Shutdown and startup scripts installed together with the rest of your daemon startup scripts (e.g I</etc/rc.d> directory), so that when you reboot your machine the user's server will be correctly shutdown and will be back online the moment your system starts up. Also make sure to start each server under the username the server belongs to, or you are going to be in big trouble! =item * Proxy services (in forward or httpd accelerator mode) for the user's virtual host. Since the user will have to run their server on an unprivileged port (E<gt>1024), you will have to forward all requests from C<user.given.virtual.hostname:80> (which is C<user.given.virtual.hostname> without the default port 80) to C<your.machine.ip:port_assigned_to_user> . You will also have to tell the users to code their scripts so that any self referencing URLs are of the form C<user.given.virtual.hostname>. Letting the user run a mod_perl server immediately adds a requirement for the user to be able to restart and configure their own server. Only root can bind to port 80, this is why your users have to use port numbers greater than 1024. Another solution would be to use a setuid startup script, but think twice before you go with it, since if users can modify the scripts they will get a root access. For more information refer to the section "L<SUID Start-up Scripts|general::control::control/SUID_Start_up_Scripts>". =item * Another problem you will have to solve is how to assign ports between users. Since users can pick any port above 1024 to run their server, you will have to lay down some rules here so that multiple servers do not conflict. A simple example will demonstrate the importance of this problem: I am a malicious user or I am just a rival of some fellow who runs his server on your ISP. All I need to do is to find out what port my rival's server is listening to (e.g. using C<netstat(8)>) and configure my own server to listen on the same port. Although I am unable to bind to this port, imagine what will happen when you reboot your system and my startup script happens to be run before my rival's one! I get the port first, now all requests will be redirected to my server. I'll leave to your imagination what nasty things might happen then. Of course the ugly things will quickly be revealed, but not before the damage has been done. Luckily there are special tools that can ensure that users that aren't authorized to bind to certain ports (above 1024) won't be able to do so. One such a tool is called C<cbs> and its documentation can be found at I<http://www.epita.fr/~flav/cbs/doc/html>. =back Basically you can preassign each user a port, without them having to worry about finding a free one, as well as enforce C<MaxClients> and similar values by implementing the following scenario: For each user have two configuration files, the main file, I<httpd.conf> (non-writable by user) and the user's file, I<username.httpd.conf> where they can specify their own configuration parameters and override the ones defined in I<httpd.conf>. Here is what the main configuration file looks like: httpd.conf ---------- # Global/default settings, the user may override some of these ... ... # Included so that user can set his own configuration Include username.httpd.conf # User-specific settings which will override any potentially # dangerous configuration directives in username.httpd.conf ... ... username.httpd.conf ------------------- # Settings that your user would like to add/override, # like <Location> and PerlModule directives, etc. Apache reads the global/default settings first. Then it reads the I<Include>'d I<username.httpd.conf> file with whatever settings the user has chosen, and finally it reads the user-specific settings that we don't want the user to override, such as the port number. Even if the user changes the port number in his I<username.httpd.conf> file, Apache reads our settings last, so they take precedence. Note that you can use L<Perl sections|guide::config/Apache_Configuration_in_Perl> to make the configuration much easier. =item 3 A much better, but costly solution is I<co-location>. Let the user hook his (or your) stand-alone machine into your network, and forget about this user. Of course either the user or you will have to undertake all the system administration chores and it will cost your client more money. Who are the people who seek mod_perl support? They are people who run serious projects/businesses. Money is not usually an obstacle. They can afford a stand alone box, thus achieving their goal of autonomy whilst keeping their ISP happy. =back =head2 Virtual Servers Technologies As we have just seen one of the obstacles of using mod_perl in ISP environments, is the problem of isolating customers using the same machine from each other. A number of virtual servers (don't confuse with virtual hosts) technologies (both commercial and Open Source) exist today. Here are some of them: =over =item * The User-mode Linux Kernel http://user-mode-linux.sourceforge.net/ User-Mode Linux is a safe, secure way of running Linux versions and Linux processes. Run buggy software, experiment with new Linux kernels or distributions, and poke around in the internals of Linux, all without risking your main Linux setup. User-Mode Linux gives you a virtual machine that may have more hardware and software virtual resources than your actual, physical computer. Disk storage for the virtual machine is entirely contained inside a single file on your physical machine. You can assign your virtual machine only the hardware access you want it to have. With properly limited access, nothing you do on the virtual machine can change or damage your real computer, or its software. So if you want to completely protect one user from another and yourself from your users this might be yet another alternative to the solutions suggested at the beginning of this chapter. =item * VMWare Technology Allows running a few instances of the same or different OSs on the same machine. This technology comes in two flavors: Open source: http://www.plex86.org/ Commercial: http://www.vmware.com/ So you may want to run a separate OS for each of your clients =item * freeVSD Technology freeVSD (http://www.freevsd.org), an open source project sponsored by Idaya Ltd. The software enables ISPs to securely partition their physical servers into many I<virtual servers>, each capable of running popular hosting applications such as Apache, Sendmail and MySQL. =item * S/390 IBM server Quoting from: http://www.s390.ibm.com/linux/vif/ "The S/390 Virtual Image Facility enables you to run tens to hundreds of Linux server images on a single S/390 server. It is ideally suited for those who want to move Linux and/or UNIX workloads deployed on multiple servers onto a single S/390 server, while maintaining the same number of distinct server images. This provides centralized management and operation of the multiple image environment, reducing complexity, easing administration and lowering costs." In two words, this a great solution to huge ISPs, as it allows you to run hundreds of mod_perl servers while having only one box to maintain. The drawback is the price :) Check out this scalable mailing list thread for more details from those who know: http://archive.develooper.com/[EMAIL PROTECTED]/msg00235.html =back =head1 Virtual Hosts in the guide If you are about to use I<Virtual Hosts> you might want to read these sections: L<Apache Configuration in Perl|guide::config/Apache_Configuration_in_Perl> L<Easing the Chores of Configuring Virtual Hosts with mod_macro|guide::config/Configuring_Apache___mod_perl_with_mod_macro> L<Is There a Way to Provide a Different startup.pl File for Each Individual Virtual Host|guide::config/Is_There_a_Way_to_Provide_a_Different_startup_pl_File_for_Each_Individual_Virtual_Host> L<Is There a Way to Modify @INC on a Per-Virtual-Host or Per-Location Basis.|guide::config/Is_There_a_Way_to_Modify__INC_on_a_Per_Virtual_Host_or_Per_Location_Basis_> L<A Script From One Virtual Host Calls a Script with the Same Path From the Other Virtual Host|guide::config/A_Script_From_One_Virtual_Host_Calls_a_Script_with_the_Same_Path_From_the_Other_Virtual_Host> =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.1 modperl-docs/src/docs/general/perl_myth/perl_myth.pod Index: perl_myth.pod =================================================================== =head1 NAME Popular Perl Complaints and Myths =head1 Description This document tries to explain the myths about Perl and overturn the FUD certain bodies try to spread. =head1 Abbreviations =over 4 =item * B<M> = Misconception or Myth =item * B<R> = Response =back =head2 Interpreted vs. Compiled =over 4 =item M: Each dynamic perl page hit needs to load the Perl interpreter and compile the script, then run it each time a dynamic web page is hit. This dramatically decreases performance as well as makes Perl an unscalable model since so much overhead is required to search each page. =item R: This myth was true years ago before the advent of mod_perl. mod_perl loads the interpreter once into memory and never needs to load it again. Each perl program is only compiled once. The compiled version is then kept into memory and used each time the program is run. In this way there is no extra overhead when hitting a mod_perl page. =back =head3 Interpreted vs. Compiled (More Gory Details) =over 4 =item R: Compiled code always has the potential to be faster than interpreted code. Ultimately, all interpreted code needs to eventually be converted to native instructions at some point, and this is invariably has to be done by a compiled application. That said, an interpreted language CAN be faster than a comprable native application in certain situations, given certain, common programming practices. For example, the allocation and de-allocation of memory can be a relatively expensive process in a tightly scoped compiled language, wheras interpreted languages typically use garbage collectors which don't need to do expensive deallocation in a tight loop, instead waiting until additional memory is absolutely necessary, or for a less computationally intensive period. Of course, using a garbage collector in C would eliminate this edge in this situation, but where using garbage collectors in C is uncommon, Perl and most other interpreted languages have built-in garbage collectors. It is also important to point out that few people use the full potential of their modern CPU with a single application. Modern CPUs are not only more than fast enough to run interpreted code, many processors include instruction sets designed to increase the performance of interpreted code. =back =head2 Perl is overly memory intensive making it unscalable =over 4 =item M: Each child process needs the Perl interpreter and all code in memory. Even with mod_perl httpd processes tend to be overly large, slowing performance, and requiring much more hardware. =item R: In mod_perl the interpreter is loaded into the parent process and shared between the children. Also, when scripts are loaded into the parent and the parent forks a child httpd process, that child shares those scripts with the parent. So while the child may take 6MB of memory, 5MB of that might be shared meaning it only really uses 1MB per child. Even 5 MB of memory per child is not uncommon for most web applications on other languages. Also, most modern operating systems support the concept of shared libraries. Perl can be compiled as a shared library, enabling the bulk of the perl interpreter to be shared between processes. Some executable formats on some platforms (I believe ELF is one such format) are able to share entire executable TEXT segments between unrelated processes. =back =head3 More Tuning Advice: =over 4 =item * L<Stas Bekman's Performance Guide|guide::performance> =back =head2 Not enough support, or tools to develop with Perl. (Myth) =over 4 =item R: Of all web applications and languages, Perl arguable has the most support and tools. B<CPAN> is a central repository of Perl modules which are freely downloadable and usually well supported. There are literally thousands of modules which make building web apps in Perl much easier. There are also countless mailing lists of extremely responsive Perl experts who usually respond to questions within an hour. There are also a number of Perl development environments to make building Perl Web applications easier. Just to name a few, there is C<Apache::ASP>, C<Mason>, C<embPerl>, C<ePerl>, etc... =back =head2 If Perl scales so well, how come no large sites use it? (myth) =over 4 =item R: Actually, many large sites DO use Perl for the bulk of their web applications. Here are some, just as an example: B<e-Toys>, B<CitySearch>, B<Internet Movie Database>( http://imdb.com ), B<Value Click> ( http://valueclick.com ), B<Paramount Digital Entertainment>, B<CMP> ( http://cmpnet.com ), B<HotBot Mail>/B<HotBot Homepages>, and B<DejaNews> to name a few. Even B<Microsoft> has taken interest in Perl via http://www.activestate.com/. =back =head2 Perl even with mod_perl, is always slower then C. =over 4 =item R: The Perl engine is written in C. There is no point arguing that Perl is faster than C because anything written in Perl could obviously be re-written in C. The same holds true for arguing that C is faster than assembly. There are two issues to consider here. First of all, many times a web application written in Perl B<CAN be faster> than C thanks to the low level optimizations in the Perl compiler. In other words, its easier to write poorly written C then well written Perl. Secondly its important to weigh all factors when choosing a language to build a web application in. Time to market is often one of the highest priorities in creating a web application. Development in Perl can often be twice as fast as in C. This is mostly due to the differences in the language themselves as well as the wealth of free examples and modules which speed development significantly. Perl's speedy development time can be a huge competitive advantage. =back =head2 Java does away with the need for Perl. =over 4 =item M: Perl had its place in the past, but now there's Java and Java will kill Perl. =item R: Java and Perl are actually more complimentary languages then competitive. Its widely accepted that server side Java solutions such as C<JServ>, C<JSP> and C<JRUN>, are far slower then mod_perl solutions (see next myth). Even so, Java is often used as the front end for server side Perl applications. Unlike Perl, with Java you can create advanced client side applications. Combined with the strength of server side Perl these client side Java applications can be made very powerful. =back =head2 Perl can't create advanced client side applications =over 4 =item R: True. There are some client side Perl solutions like PerlScript in MSIE 5.0, but all client side Perl requires the user to have the Perl interpreter on their local machine. Most users do not have a Perl interpreter on their local machine. Most Perl programmers who need to create an advanced client side application use Java as their client side programming language and Perl as the server side solution. =back =head2 ASP makes Perl obsolete as a web programming language. =over 4 =item M: With Perl you have to write individual programs for each set of pages. With ASP you can write simple code directly within HTML pages. ASP is the Perl killer. =item R: There are many solutions which allow you to embed Perl in web pages just like ASP. In fact, you can actually use Perl IN ASP pages with PerlScript. Other solutions include: C<Mason>, C<Apache::ASP>, C<ePerl>, C<embPerl> and C<XPP>. Also, Microsoft and ActiveState have worked very hard to make Perl run equally well on NT as Unix. You can even create COM modules in Perl that can be used from within ASP pages. Some other advantages Perl has over ASP: mod_perl is usually much faster then ASP, Perl has much more example code and full programs which are freely downloadable, and Perl is cross platform, able to run on Solaris, Linux, SCO, Digital Unix, Unix V, AIX, OS2, VMS MacOS, Win95-98 and NT to name a few. Also, Benchmarks show that embedded Perl solutions outperform ASP/VB on IIS by several orders of magnitude. Perl is a much easier language for some to learn, especially those with a background in C or C++. =back =head1 Credits Thanks to the mod_perl list for all of the good information and criticism. I'd especially like to thank, =over 4 =item * Stas Bekman E<lt>[EMAIL PROTECTED]<gt> =item * Thornton Prime E<lt>[EMAIL PROTECTED]<gt> =item * Chip Turner E<lt>[EMAIL PROTECTED]<gt> =item * Clinton E<lt>[EMAIL PROTECTED]<gt> =item * Joshua Chamas E<lt>[EMAIL PROTECTED]<gt> =item * John Edstrom E<lt>[EMAIL PROTECTED]<gt> =item * Rasmus Lerdorf E<lt>[EMAIL PROTECTED]<gt> =item * Nedim Cholich E<lt>[EMAIL PROTECTED]<gt> =item * Mike Perry E<lt> http://www.icorp.net/icorp/feedback.htm E<gt> =item * Finally, I'd like to thank Robert Santos E<lt>[EMAIL PROTECTED]<gt>, CyberNation's lead Business Development guy for inspiring this document. =back =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Contact the L<mod_perl docs list|maillist::docs-dev> =back =head1 Authors =over =item * Adam Pisoni E<lt>[EMAIL PROTECTED]<gt> =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.1 modperl-docs/src/docs/general/perl_reference/perl_reference.pod Index: perl_reference.pod =================================================================== =head1 NAME Perl Reference =head1 Description This document was born because some users are reluctant to learn Perl, prior to jumping into mod_perl. I will try to cover some of the most frequent pure Perl questions being asked at the list. Before you decide to skip this chapter make sure you know all the information provided here. The rest of the Guide assumes that you have read this chapter and understood it. =head1 perldoc's Rarely Known But Very Useful Options First of all, I want to stress that you cannot become a Perl hacker without knowing how to read Perl documentation and search through it. Books are good, but an easily accessible and searchable Perl reference at your fingertips is a great time saver. It always has the up-to-date information for the version of perl you're using. Of course you can use online Perl documentation at the Web. The two major sites are http://www.perldoc.com and http://theoryx5.uwinnipeg.ca/CPAN/perl/. The C<perldoc> utility provides you with access to the documentation installed on your system. To find out what Perl manpages are available execute: % perldoc perl To find what functions perl has, execute: % perldoc perlfunc To learn the syntax and to find examples of a specific function, you would execute (e.g. for C<open()>): % perldoc -f open Note: In perl5.005_03 and earlier, there is a bug in this and the C<-q> options of C<perldoc>. It won't call C<pod2man>, but will display the section in POD format instead. Despite this bug it's still readable and very useful. The Perl FAQ (I<perlfaq> manpage) is in several sections. To search through the sections for C<open> you would execute: % perldoc -q open This will show you all the matching Question and Answer sections, still in POD format. To read the I<perldoc> manpage you would execute: % perldoc perldoc =head1 Tracing Warnings Reports Sometimes it's very hard to understand what a warning is complaining about. You see the source code, but you cannot understand why some specific snippet produces that warning. The mystery often results from the fact that the code can be called from different places if it's located inside a subroutine. Here is an example: warnings.pl ----------- #!/usr/bin/perl -w use strict; correct(); incorrect(); sub correct{ print_value("Perl"); } sub incorrect{ print_value(); } sub print_value{ my $var = shift; print "My value is $var\n"; } In the code above, print_value() prints the passed value. Subroutine correct() passes the value to print, but in subroutine incorrect() we forgot to pass it. When we run the script: % ./warnings.pl we get the warning: Use of uninitialized value at ./warnings.pl line 16. Perl complains about an undefined variable C<$var> at the line that attempts to print its value: print "My value is $var\n"; But how do we know why it is undefined? The reason here obviously is that the calling function didn't pass the argument. But how do we know who was the caller? In our example there are two possible callers, in the general case there can be many of them, perhaps located in other files. We can use the caller() function, which tells who has called us, but even that might not be enough: it's possible to have a longer sequence of called subroutines, and not just two. For example, here it is sub third() which is at fault, and putting sub caller() in sub second() would not help us very much: sub third{ second(); } sub second{ my $var = shift; first($var); } sub first{ my $var = shift; print "Var = $var\n" } The solution is quite simple. What we need is a full calls stack trace to the call that triggered the warning. The C<Carp> module comes to our aid with its cluck() function. Let's modify the script by adding a couple of lines. The rest of the script is unchanged. warnings2.pl ----------- #!/usr/bin/perl -w use strict; use Carp (); local $SIG{__WARN__} = \&Carp::cluck; correct(); incorrect(); sub correct{ print_value("Perl"); } sub incorrect{ print_value(); } sub print_value{ my $var = shift; print "My value is $var\n"; } Now when we execute it, we see: Use of uninitialized value at ./warnings2.pl line 19. main::print_value() called at ./warnings2.pl line 14 main::incorrect() called at ./warnings2.pl line 7 Take a moment to understand the calls stack trace. The deepest calls are printed first. So the second line tells us that the warning was triggered in print_value(); the third, that print_value() was called by subroutine, incorrect(). script => incorrect() => print_value() We go into C<incorrect()> and indeed see that we forgot to pass the variable. Of course when you write a subroutine like C<print_value> it would be a good idea to check the passed arguments before starting execution. We omitted that step to contrive an easily debugged example. Sure, you say, I could find that problem by simple inspection of the code! Well, you're right. But I promise you that your task would be quite complicated and time consuming if your code has some thousands of lines. In addition, under mod_perl, certain uses of the C<eval> operator and "here documents" are known to throw off Perl's line numbering, so the messages reporting warnings and errors can have incorrect line numbers. (See L<Finding the Line Which Triggered the Error or Warning|guide::debug/Finding_the_Line_Which_Triggered> for more information). Getting the trace helps a lot. =head1 Variables Globally, Lexically Scoped And Fully Qualified META: this material is new and requires polishing so read with care. You will hear a lot about namespaces, symbol tables and lexical scoping in Perl discussions, but little of it will make any sense without a few key facts: =head2 Symbols, Symbol Tables and Packages; Typeglobs There are two important types of symbol: package global and lexical. We will talk about lexical symbols later, for now we will talk only about package global symbols, which we will refer to simply as I<global symbols>. The names of pieces of your code (subroutine names) and the names of your global variables are symbols. Global symbols reside in one symbol table or another. The code itself and the data do not; the symbols are the names of pointers which point (indirectly) to the memory areas which contain the code and data. (Note for C/C++ programmers: we use the term `pointer' in a general sense of one piece of data referring to another piece of data not in a specific sense as used in C or C++.) There is one symbol table for each package, (which is why I<global symbols> are really I<package global symbols>). You are always working in one package or another. Like in C, where the first function you write must be called main(), the first statement of your first Perl script is in package C<main::> which is the default package. Unless you say otherwise by using the C<package> statement, your symbols are all in package C<main::>. You should be aware straight away that files and packages are I<not related>. You can have any number of packages in a single file; and a single package can be in one file or spread over many files. However it is very common to have a single package in a single file. To declare a package you write: package mypackagename; From the following line you are in package C<mypackagename> and any symbols you declare reside in that package. When you create a symbol (variable, subroutine etc.) Perl uses the name of the package in which you are currently working as a prefix to create the fully qualified name of the symbol. When you create a symbol, Perl creates a symbol table entry for that symbol in the current package's symbol table (by default C<main::>). Each symbol table entry is called a I<typeglob>. Each typeglob can hold information on a scalar, an array, a hash, a subroutine (code), a filehandle, a directory handle and a format, each of which all have the same name. So you see now that there are two indirections for a global variable: the symbol, (the thing's name), points to its typeglob and the typeglob for the thing's type (scalar, array, etc.) points to the data. If we had a scalar and an array with the same name their name would point to the same typeglob, but for each type of data the typeglob points to somewhere different and so the scalar's data and the array's data are completely separate and independent, they just happen to have the same name. Most of the time, only one part of a typeglob is used (yes, it's a bit wasteful). You will by now know that you distinguish between them by using what the authors of the Camel book call a I<funny character>. So if we have a scalar called `C<line>' we would refer to it in code as C<$line>, and if we had an array of the same name, that would be written, C<@line>. Both would point to the same typeglob (which would be called C<*line>), but because of the I<funny character> (also known as I<decoration>) perl won't confuse the two. Of course we might confuse ourselves, so some programmers don't ever use the same name for more than one type of variable. Every global symbol is in some package's symbol table. To refer to a global symbol we could write the I<fully qualified> name, e.g. C<$main::line>. If we are in the same package as the symbol we can omit the package name, e.g. C<$line> (unless you use the C<strict> pragma and then you will have to predeclare the variable using the C<vars> pragma). We can also omit the package name if we have imported the symbol into our current package's namespace. If we want to refer to a symbol that is in another package and which we haven't imported we must use the fully qualified name, e.g. C<$otherpkg::box>. Most of the time you do not need to use the fully qualified symbol name because most of the time you will refer to package variables from within the package. This is very like C++ class variables. You can work entirely within package C<main::> and never even know you are using a package, nor that the symbols have package names. In a way, this is a pity because you may fail to learn about packages and they are extremely useful. The exception is when you I<import> the variable from another package. This creates an alias for the variable in the I<current> package, so that you can access it without using the fully qualified name. Whilst global variables are useful for sharing data and are necessary in some contexts it is usually wisest to minimize their use and use I<lexical variables>, discussed next, instead. Note that when you create a variable, the low-level business of allocating memory to store the information is handled automatically by Perl. The intepreter keeps track of the chunks of memory to which the pointers are pointing and takes care of undefining variables. When all references to a variable have ceased to exist then the perl garbage collector is free to take back the memory used ready for recycling. However perl almost never returns back memory it has already used to the operating system during the lifetime of the process. =head3 Lexical Variables and Symbols The symbols for lexical variables (i.e. those declared using the keyword C<my>) are the only symbols which do I<not> live in a symbol table. Because of this, they are not available from outside the block in which they are declared. There is no typeglob associated with a lexical variable and a lexical variable can refer only to a scalar, an array, a hash or a code reference. (Since perl-5.6 it can also refer to a file glob). If you need access to the data from outside the package then you can return it from a subroutine, or you can create a global variable (i.e. one which has a package prefix) which points or refers to it and return that. The pointer or reference must be global so that you can refer to it by a fully qualified name. But just like in C try to avoid having global variables. Using OO methods generally solves this problem, by providing methods to get and set the desired value within the object that can be lexically scoped inside the package and passed by reference. The phrase "lexical variable" is a bit of a misnomer, we are really talking about "lexical symbols". The data can be referenced by a global symbol too, and in such cases when the lexical symbol goes out of scope the data will still be accessible through the global symbol. This is perfectly legitimate and cannot be compared to the terrible mistake of taking a pointer to an automatic C variable and returning it from a function--when the pointer is dereferenced there will be a segmentation fault. (Note for C/C++ programmers: having a function return a pointer to an auto variable is a disaster in C or C++; the perl equivalent, returning a reference to a lexical variable created in a function is normal and useful.) =over =item * C<my()> vs. C<use vars>: With use vars(), you are making an entry in the symbol table, and you are telling the compiler that you are going to be referencing that entry without an explicit package name. With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE. The compiler figures out C<at compile time> which my() variables (i.e. lexical variables) are the same as each other, and once you hit execute time you cannot go looking those variables up in the symbol table. =item * C<my()> vs. C<local()>: local() creates a temporal-limited package-based scalar, array, hash, or glob -- when the scope of definition is exited at runtime, the previous value (if any) is restored. References to such a variable are *also* global... only the value changes. (Aside: that is what causes variable suicide. :) my() creates a lexically-limited non-package-based scalar, array, or hash -- when the scope of definition is exited at compile-time, the variable ceases to be accessible. Any references to such a variable at runtime turn into unique anonymous variables on each scope exit. =back =head2 Additional reading references For more information see: L<Using global variables and sharing them between modules/packages|general::perl_reference::perl_reference/Using_Global_Variables_and_Shari> and an article by Mark-Jason Dominus about how Perl handles variables and namespaces, and the difference between C<use vars()> and C<my()> - http://www.plover.com/~mjd/perl/FAQs/Namespaces.html . =head1 my() Scoped Variable in Nested Subroutines Before we proceed let's make the assumption that we want to develop the code under the C<strict> pragma. We will use lexically scoped variables (with help of the my() operator) whenever it's possible. =head2 The Poison Let's look at this code: nested.pl ----------- #!/usr/bin/perl use strict; sub print_power_of_2 { my $x = shift; sub power_of_2 { return $x ** 2; } my $result = power_of_2(); print "$x^2 = $result\n"; } print_power_of_2(5); print_power_of_2(6); Don't let the weird subroutine names fool you, the print_power_of_2() subroutine should print the square of the number passed to it. Let's run the code and see whether it works: % ./nested.pl 5^2 = 25 6^2 = 25 Ouch, something is wrong. May be there is a bug in Perl and it doesn't work correctly with the number 6? Let's try again using 5 and 7: print_power_of_2(5); print_power_of_2(7); And run it: % ./nested.pl 5^2 = 25 7^2 = 25 Wow, does it works only for 5? How about using 3 and 5: print_power_of_2(3); print_power_of_2(5); and the result is: % ./nested.pl 3^2 = 9 5^2 = 9 Now we start to understand--only the first call to the print_power_of_2() function works correctly. Which makes us think that our code has some kind of memory for the results of the first execution, or it ignores the arguments in subsequent executions. =head2 The Diagnosis Let's follow the guidelines and use the C<-w> flag. Now execute the code: % ./nested.pl Variable "$x" will not stay shared at ./nested.pl line 9. 5^2 = 25 6^2 = 25 We have never seen such a warning message before and we don't quite understand what it means. The C<diagnostics> pragma will certainly help us. Let's prepend this pragma before the C<strict> pragma in our code: #!/usr/bin/perl -w use diagnostics; use strict; And execute it: % ./nested.pl Variable "$x" will not stay shared at ./nested.pl line 10 (#1) (W) An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine. When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first* call to the outer subroutine; in this case, after the first call to the outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared. Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subroutines will never share the given variable. This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are called or referenced, they are automatically rebound to the current values of such variables. 5^2 = 25 6^2 = 25 Well, now everything is clear. We have the B<inner> subroutine power_of_2() and the B<outer> subroutine print_power_of_2() in our code. When the inner power_of_2() subroutine is called for the first time, it sees the value of the outer print_power_of_2() subroutine's C<$x> variable. On subsequent calls the inner subroutine's C<$x> variable won't be updated, no matter what new values are given to C<$x> in the outer subroutine. There are two copies of the C<$x> variable, no longer a single one shared by the two routines. =head2 The Remedy The C<diagnostics> pragma suggests that the problem can be solved by making the inner subroutine anonymous. An anonymous subroutine can act as a I<closure> with respect to lexically scoped variables. Basically this means that if you define a subroutine in a particular B<lexical> context at a particular moment, then it will run in that same context later, even if called from outside that context. The upshot of this is that when the subroutine B<runs>, you get the same copies of the lexically scoped variables which were visible when the subroutine was B<defined>. So you can pass arguments to a function when you define it, as well as when you invoke it. Let's rewrite the code to use this technique: anonymous.pl -------------- #!/usr/bin/perl use strict; sub print_power_of_2 { my $x = shift; my $func_ref = sub { return $x ** 2; }; my $result = &$func_ref(); print "$x^2 = $result\n"; } print_power_of_2(5); print_power_of_2(6); Now C<$func_ref> contains a reference to an anonymous subroutine, which we later use when we need to get the power of two. Since it is anonymous, the subroutine will automatically be rebound to the new value of the outer scoped variable C<$x>, and the results will now be as expected. Let's verify: % ./anonymous.pl 5^2 = 25 6^2 = 36 So we can see that the problem is solved. =head1 Understanding Closures -- the Easy Way In Perl, a closure is just a subroutine that refers to one or more lexical variables declared outside the subroutine itself and must therefore create a distinct clone of the environment on the way out. And both named subroutines and anonymous subroutines can be closures. Here's how to tell if a subroutine is a closure or not: for (1..5) { push @a, sub { "hi there" }; } for (1..5) { { my $b; push @b, sub { $b."hi there" }; } } print "anon normal:\n", join "\t\n",@a,"\n"; print "anon closure:\n",join "\t\n",@b,"\n"; which generates: anon normal: CODE(0x80568e4) CODE(0x80568e4) CODE(0x80568e4) CODE(0x80568e4) CODE(0x80568e4) anon closure: CODE(0x804b4c0) CODE(0x8056b54) CODE(0x8056bb4) CODE(0x80594d8) CODE(0x8059538) Note how each code reference from the non-closure is identical, but the closure form must generate distinct coderefs to point at the distinct instances of the closure. And now the same with named subroutines: for (1..5) { sub a { "hi there" }; push @a, \&a; } for (1..5) { { my $b; sub b { $b."hi there" }; push @b, \&b; } } print "normal:\n", join "\t\n",@a,"\n"; print "closure:\n",join "\t\n",@b,"\n"; which generates: anon normal: CODE(0x80568c0) CODE(0x80568c0) CODE(0x80568c0) CODE(0x80568c0) CODE(0x80568c0) anon closure: CODE(0x8056998) CODE(0x8056998) CODE(0x8056998) CODE(0x8056998) CODE(0x8056998) We can see that both versions has generated the same code reference. For the subroutine I<a> it's easy, since it doesn't include any lexical variables defined outside it in the same lexical scope. As for the subroutine I<b>, it's indeed a closure, but Perl won't recompile it since it's a named subroutine (see the I<perlsub> manpage). It's something that we don't want to happen in our code unless we want it for this special effect, similar to I<static> variables in C. This is the underpinnings of that famous I<"won't stay shared"> message. A I<my> variable in a named subroutine context is generating identical code references and therefore it ignores any future changes to the lexical variables outside of it. =head2 Mike Guy's Explanation of the Inner Subroutine Behavior From: [EMAIL PROTECTED] (M.J.T. Guy) Newsgroups: comp.lang.perl.misc Subject: Re: Lexical scope and embedded subroutines. Date: 6 Jan 1998 18:22:39 GMT Message-ID: <[EMAIL PROTECTED]> In article <[EMAIL PROTECTED]>, Aaron Harsh <[EMAIL PROTECTED]> wrote: > Before I read this thread (and perlsub to get the details) I would > have assumed the original code was fine. > > This behavior brings up the following questions: > o Is Perl's behavior some sort of speed optimization? No, but see below. > o Did the Perl gods just decide that scheme-like behavior was less > important than the pseduo-static variables described in perlsub? This subject has been kicked about at some length on perl5-porters. The current behaviour was chosen as the best of a bad job. In the context of Perl, it's not obvious what "scheme-like behavior" means. So it isn't an option. See below for details. > o Does anyone else find Perl's behavior counter-intuitive? *Everyone* finds it counterintuitive. The fact that it only generates a warning rather than a hard error is part of the Perl Gods policy of hurling thunderbolts at those so irreverent as not to use -w. > o Did programming in scheme destroy my ability to judge a decent > language > feature? You're still interested in Perl, so it can't have rotted your brain completely. > o Have I misremembered how scheme handles these situations? Probably not. > o Do Perl programmers really care how much Perl acts like scheme? Some do. > o Should I have stopped this message two or three questions ago? Yes. The problem to be solved can be stated as "When a subroutine refers to a variable which is instantiated more than once (i.e. the variable is declared in a for loop, or in a subroutine), which instance of that variable should be used?" The basic problem is that Perl isn't Scheme (or Pascal or any of the other comparators that have been used). In almost all lexically scoped languages (i.e. those in the Algol60 tradition), named subroutines are also lexically scoped. So the scope of the subroutine is necessarily contained in the scope of any external variable referred to inside the subroutine. So there's an obvious answer to the "which instance?" problem. But in Perl, named subroutines are globally scoped. (But in some future Perl, you'll be able to write my sub lex { ... } to get lexical scoping.) So the solution adopted by other languages can't be used. The next suggestion most people come up with is "Why not use the most recently instantiated variable?". This Does The Right Thing in many cases, but fails when recursion or other complications are involved. Consider: sub outer { inner(); outer(); my $trouble; inner(); sub inner { $trouble }; outer(); inner(); } Which instance of $trouble is to be used for each call of inner()? And why? The consensus was that an incomplete solution was unacceptable, so the simple rule "Use the first instance" was adopted instead. And it is more efficient than possible alternative rules. But that's not why it was done. Mike Guy =head1 When You Cannot Get Rid of The Inner Subroutine First you might wonder, why in the world will someone need to define an inner subroutine? Well, for example to reduce some of Perl's script startup overhead you might decide to write a daemon that will compile the scripts and modules only once, and cache the pre-compiled code in memory. When some script is to be executed, you just tell the daemon the name of the script to run and it will do the rest and do it much faster since compilation has already taken place. Seems like an easy task, and it is. The only problem is once the script is compiled, how do you execute it? Or let's put it the other way: after it was executed for the first time and it stays compiled in the daemon's memory, how do you call it again? If you could get all developers to code their scripts so each has a subroutine called run() that will actually execute the code in the script then we've solved half the problem. But how does the daemon know to refer to some specific script if they all run in the C<main::> name space? One solution might be to ask the developers to declare a package in each and every script, and for the package name to be derived from the script name. However, since there is a chance that there will be more than one script with the same name but residing in different directories, then in order to prevent namespace collisions the directory has to be a part of the package name too. And don't forget that the script may be moved from one directory to another, so you will have to make sure that the package name is corrected every time the script gets moved. But why enforce these strange rules on developers, when we can arrange for our daemon to do this work? For every script that the daemon is about to execute for the first time, the script should be wrapped inside the package whose name is constructed from the mangled path to the script and a subroutine called run(). For example if the daemon is about to execute the script I</tmp/hello.pl>: hello.pl -------- #!/usr/bin/perl print "Hello\n"; Prior to running it, the daemon will change the code to be: wrapped_hello.pl ---------------- package cache::tmp::hello_2epl; sub run{ #!/usr/bin/perl print "Hello\n"; } The package name is constructed from the prefix C<cache::>, each directory separation slash is replaced with C<::>, and non alphanumeric characters are encoded so that for example C<.> (a dot) becomes C<_2e> (an underscore followed by the ASCII code for a dot in hex representation). % perl -e 'printf "%x",ord(".")' prints: C<2e>. The underscore is the same you see in URL encoding except the C<%> character is used instead (C<%2E>), but since C<%> has a special meaning in Perl (prefix of hash variable) it couldn't be used. Now when the daemon is requested to execute the script I</tmp/hello.pl>, all it has to do is to build the package name as before based on the location of the script and call its run() subroutine: use cache::tmp::hello_2epl; cache::tmp::hello_2epl::run(); We have just written a partial prototype of the daemon we wanted. The only outstanding problem is how to pass the path to the script to the daemon. This detail is left as an exercise for the reader. If you are familiar with the C<Apache::Registry> module, you know that it works in almost the same way. It uses a different package prefix and the generic function is called handler() and not run(). The scripts to run are passed through the HTTP protocol's headers. Now you understand that there are cases where your normal subroutines can become inner, since if your script was a simple: simple.pl --------- #!/usr/bin/perl sub hello { print "Hello" } hello(); Wrapped into a run() subroutine it becomes: simple.pl --------- package cache::simple_2epl; sub run{ #!/usr/bin/perl sub hello { print "Hello" } hello(); } Therefore, hello() is an inner subroutine and if you have used my() scoped variables defined and altered outside and used inside hello(), it won't work as you expect starting from the second call, as was explained in the previous section. =head2 Remedies for Inner Subroutines First of all there is nothing to worry about, as long as you don't forget to turn the warnings On. If you do happen to have the "L<my() Scoped Variable in Nested Subroutines|general::perl_reference::perl_reference/my_Scoped_Variable_in_Nested_S>" problem, Perl will always alert you. Given that you have a script that has this problem, what are the ways to solve it? There are many of them and we will discuss some of them here. We will use the following code to show the different solutions. multirun.pl ----------- #!/usr/bin/perl -w use strict; for (1..3){ print "run: [time $_]\n"; run(); } sub run{ my $counter = 0; increment_counter(); increment_counter(); sub increment_counter{ $counter++; print "Counter is equal to $counter !\n"; } } # end of sub run This code executes the run() subroutine three times, which in turn initializes the C<$counter> variable to 0, every time it is executed and then calls the inner subroutine increment_counter() twice. Sub increment_counter() prints C<$counter>'s value after incrementing it. One might expect to see the following output: run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 3] Counter is equal to 1 ! Counter is equal to 2 ! But as we have already learned from the previous sections, this is not what we are going to see. Indeed, when we run the script we see: % ./multirun.pl Variable "$counter" will not stay shared at ./nested.pl line 18. run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 3 ! Counter is equal to 4 ! run: [time 3] Counter is equal to 5 ! Counter is equal to 6 ! Obviously, the C<$counter> variable is not reinitialized on each execution of run(). It retains its value from the previous execution, and sub increment_counter() increments that. One of the workarounds is to use globally declared variables, with the C<vars> pragma. multirun1.pl ----------- #!/usr/bin/perl -w use strict; use vars qw($counter); for (1..3){ print "run: [time $_]\n"; run(); } sub run { $counter = 0; increment_counter(); increment_counter(); sub increment_counter{ $counter++; print "Counter is equal to $counter !\n"; } } # end of sub run If you run this and the other solutions offered below, the expected output will be generated: % ./multirun1.pl run: [time 1] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 2] Counter is equal to 1 ! Counter is equal to 2 ! run: [time 3] Counter is equal to 1 ! Counter is equal to 2 ! By the way, the warning we saw before has gone, and so has the problem, since there is no C<my()> (lexically defined) variable used in the nested subroutine. Another approach is to use fully qualified variables. This is better, since less memory will be used, but it adds a typing overhead: multirun2.pl ----------- #!/usr/bin/perl -w use strict; for (1..3){ print "run: [time $_]\n"; run(); } sub run { $main::counter = 0; increment_counter(); increment_counter(); sub increment_counter{ $main::counter++; print "Counter is equal to $main::counter !\n"; } } # end of sub run You can also pass the variable to the subroutine by value and make the subroutine return it after it was updated. This adds time and memory overheads, so it may not be good idea if the variable can be very large, or if speed of execution is an issue. Don't rely on the fact that the variable is small during the development of the application, it can grow quite big in situations you don't expect. For example, a very simple HTML form text entry field can return a few megabytes of data if one of your users is bored and wants to test how good your code is. It's not uncommon to see users copy-and-paste 10Mb core dump files into a form's text fields and then submit it for your script to process. multirun3.pl ----------- #!/usr/bin/perl -w use strict; for (1..3){ print "run: [time $_]\n"; run(); } sub run { my $counter = 0; $counter = increment_counter($counter); $counter = increment_counter($counter); sub increment_counter{ my $counter = shift; $counter++; print "Counter is equal to $counter !\n"; return $counter; } } # end of sub run Finally, you can use references to do the job. The version of increment_counter() below accepts a reference to the C<$counter> variable and increments its value after first dereferencing it. When you use a reference, the variable you use inside the function is physically the same bit of memory as the one outside the function. This technique is often used to enable a called function to modify variables in a calling function. multirun4.pl ----------- #!/usr/bin/perl -w use strict; for (1..3){ print "run: [time $_]\n"; run(); } sub run { my $counter = 0; increment_counter(\$counter); increment_counter(\$counter); sub increment_counter{ my $r_counter = shift; $$r_counter++; print "Counter is equal to $$r_counter !\n"; } } # end of sub run Here is yet another and more obscure reference usage. We modify the value of C<$counter> inside the subroutine by using the fact that variables in C<@_> are aliases for the actual scalar parameters. Thus if you called a function with two arguments, those would be stored in C<$_[0]> and C<$_[1]>. In particular, if an element C<$_[0]> is updated, the corresponding argument is updated (or an error occurs if it is not updatable as would be the case of calling the function with a literal, e.g. I<increment_counter(5)>). multirun5.pl ----------- #!/usr/bin/perl -w use strict; for (1..3){ print "run: [time $_]\n"; run(); } sub run { my $counter = 0; increment_counter($counter); increment_counter($counter); sub increment_counter{ $_[0]++; print "Counter is equal to $_[0] !\n"; } } # end of sub run The approach given above should be properly documented of course. Here is a solution that avoids the problem entirely by splitting the code into two files; the first is really just a wrapper and loader, the second file contains the heart of the code. multirun6.pl ----------- #!/usr/bin/perl -w use strict; require 'multirun6-lib.pl' ; for (1..3){ print "run: [time $_]\n"; run(); } Separate file: multirun6-lib.pl ---------------- use strict ; my $counter; sub run { $counter = 0; increment_counter(); increment_counter(); } sub increment_counter{ $counter++; print "Counter is equal to $counter !\n"; } 1 ; Now you have at least six workarounds to choose from. For more information please refer to perlref and perlsub manpages. =head1 use(), require(), do(), %INC and @INC Explained =head2 The @INC array C<@INC> is a special Perl variable which is the equivalent of the shell's C<PATH> variable. Whereas C<PATH> contains a list of directories to search for executables, C<@INC> contains a list of directories from which Perl modules and libraries can be loaded. When you use(), require() or do() a filename or a module, Perl gets a list of directories from the C<@INC> variable and searches them for the file it was requested to load. If the file that you want to load is not located in one of the listed directories, you have to tell Perl where to find the file. You can either provide a path relative to one of the directories in C<@INC>, or you can provide the full path to the file. =head2 The %INC hash C<%INC> is another special Perl variable that is used to cache the names of the files and the modules that were successfully loaded and compiled by use(), require() or do() statements. Before attempting to load a file or a module with use() or require(), Perl checks whether it's already in the C<%INC> hash. If it's there, the loading and therefore the compilation are not performed at all. Otherwise the file is loaded into memory and an attempt is made to compile it. do() does unconditional loading--no lookup in the C<%INC> hash is made. If the file is successfully loaded and compiled, a new key-value pair is added to C<%INC>. The key is the name of the file or module as it was passed to the one of the three functions we have just mentioned, and if it was found in any of the C<@INC> directories except C<"."> the value is the full path to it in the file system. The following examples will make it easier to understand the logic. First, let's see what are the contents of C<@INC> on my system: % perl -e 'print join "\n", @INC' /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 . Notice the C<.> (current directory) is the last directory in the list. Now let's load the module C<strict.pm> and see the contents of C<%INC>: % perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC' strict.pm => /usr/lib/perl5/5.00503/strict.pm Since C<strict.pm> was found in I</usr/lib/perl5/5.00503/> directory and I</usr/lib/perl5/5.00503/> is a part of C<@INC>, C<%INC> includes the full path as the value for the key C<strict.pm>. Now let's create the simplest module in C</tmp/test.pm>: test.pm ------- 1; It does nothing, but returns a true value when loaded. Now let's load it in different ways: % cd /tmp % perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC' test.pm => test.pm Since the file was found relative to C<.> (the current directory), the relative path is inserted as the value. If we alter C<@INC>, by adding I</tmp> to the end: % cd /tmp % perl -e 'BEGIN{push @INC, "/tmp"} use test; \ print map {"$_ => $INC{$_}\n"} keys %INC' test.pm => test.pm Here we still get the relative path, since the module was found first relative to C<".">. The directory I</tmp> was placed after C<.> in the list. If we execute the same code from a different directory, the C<"."> directory won't match, % cd / % perl -e 'BEGIN{push @INC, "/tmp"} use test; \ print map {"$_ => $INC{$_}\n"} keys %INC' test.pm => /tmp/test.pm so we get the full path. We can also prepend the path with unshift(), so it will be used for matching before C<"."> and therefore we will get the full path as well: % cd /tmp % perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \ print map {"$_ => $INC{$_}\n"} keys %INC' test.pm => /tmp/test.pm The code: BEGIN{unshift @INC, "/tmp"} can be replaced with the more elegant: use lib "/tmp"; Which is almost equivalent to our C<BEGIN> block and is the recommended approach. These approaches to modifying C<@INC> can be labor intensive, since if you want to move the script around in the file-system you have to modify the path. This can be painful, for example, when you move your scripts from development to a production server. There is a module called C<FindBin> which solves this problem in the plain Perl world, but unfortunately it won't work under mod_perl, since it's a module and as any module it's loaded only once. So the first script using it will have all the settings correct, but the rest of the scripts will not if located in a different directory from the first. For the sake of completeness, I'll present this module anyway. If you use this module, you don't need to write a hard coded path. The following snippet does all the work for you (the file is I</tmp/load.pl>): load.pl ------- #!/usr/bin/perl use FindBin (); use lib "$FindBin::Bin"; use test; print "test.pm => $INC{'test.pm'}\n"; In the above example C<$FindBin::Bin> is equal to I</tmp>. If we move the script somewhere else... e.g. I</tmp/new_dir> in the code above C<$FindBin::Bin> equals I</tmp/new_dir>. % /tmp/load.pl test.pm => /tmp/test.pm This is just like C<use lib> except that no hard coded path is required. You can use this workaround to make it work under mod_perl. do 'FindBin.pm'; unshift @INC, "$FindBin::Bin"; require test; #maybe test::import( ... ) here if need to import stuff This has a slight overhead because it will load from disk and recompile the C<FindBin> module on each request. So it may not be worth it. =head2 Modules, Libraries and Program Files Before we proceed, let's define what we mean by I<module>, I<library> and I<program file>. =over =item * Libraries These are files which contain Perl subroutines and other code. When these are used to break up a large program into manageable chunks they don't generally include a package declaration; when they are used as subroutine libraries they often do have a package declaration. Their last statement returns true, a simple C<1;> statement ensures that. They can be named in any way desired, but generally their extension is I<.pl>. Examples: config.pl ---------- # No package so defaults to main:: $dir = "/home/httpd/cgi-bin"; $cgi = "/cgi-bin"; 1; mysubs.pl ---------- # No package so defaults to main:: sub print_header{ print "Content-type: text/plain\r\n\r\n"; } 1; web.pl ------------ package web ; # Call like this: web::print_with_class('loud',"Don't shout!"); sub print_with_class{ my( $class, $text ) = @_ ; print qq{<span class="$class">$text</span>}; } 1; =item * Modules A file which contains perl subroutines and other code. It generally declares a package name at the beginning of it. Modules are generally used either as function libraries (which I<.pl> files are still but less commonly used for), or as object libraries where a module is used to define a class and its methods. Its last statement returns true. The naming convention requires it to have a I<.pm> extension. Example: MyModule.pm ----------- package My::Module; $My::Module::VERSION = 0.01; sub new{ return bless {}, shift;} END { print "Quitting\n"} 1; =item * Program Files Many Perl programs exist as a single file. Under Linux and other Unix-like operating systems the file often has no suffix since the operating system can determine that it is a perl script from the first line (shebang line) or if it's Apache that executes the code, there is a variety of ways to tell how and when the file should be executed. Under Windows a suffix is normally used, for example C<.pl> or C<.plx>. The program file will normally C<require()> any libraries and C<use()> any modules it requires for execution. It will contain Perl code but won't usually have any package names. Its last statement may return anything or nothing. =back =head2 require() require() reads a file containing Perl code and compiles it. Before attempting to load the file it looks up the argument in C<%INC> to see whether it has already been loaded. If it has, require() just returns without doing a thing. Otherwise an attempt will be made to load and compile the file. require() has to find the file it has to load. If the argument is a full path to the file, it just tries to read it. For example: require "/home/httpd/perl/mylibs.pl"; If the path is relative, require() will attempt to search for the file in all the directories listed in C<@INC>. For example: require "mylibs.pl"; If there is more than one occurrence of the file with the same name in the directories listed in C<@INC> the first occurrence will be used. The file must return I<TRUE> as the last statement to indicate successful execution of any initialization code. Since you never know what changes the file will go through in the future, you cannot be sure that the last statement will always return I<TRUE>. That's why the suggestion is to put "C<1;>" at the end of file. Although you should use the real filename for most files, if the file is a L<module|general::perl_reference::perl_reference/Modules__Libraries_and_Program_Files>, you may use the following convention instead: require My::Module; This is equal to: require "My/Module.pm"; If require() fails to load the file, either because it couldn't find the file in question or the code failed to compile, or it didn't return I<TRUE>, then the program would die(). To prevent this the require() statement can be enclosed into an eval() exception-handling block, as in this example: require.pl ---------- #!/usr/bin/perl -w eval { require "/file/that/does/not/exists"}; if ($@) { print "Failed to load, because : $@" } print "\nHello\n"; When we execute the program: % ./require.pl Failed to load, because : Can't locate /file/that/does/not/exists in @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .) at require.pl line 3. Hello We see that the program didn't die(), because I<Hello> was printed. This I<trick> is useful when you want to check whether a user has some module installed, but if she hasn't it's not critical, perhaps the program can run without this module with reduced functionality. If we remove the eval() part and try again: require.pl ---------- #!/usr/bin/perl -w require "/file/that/does/not/exists"; print "\nHello\n"; % ./require1.pl Can't locate /file/that/does/not/exists in @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux /usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3. The program just die()s in the last example, which is what you want in most cases. For more information refer to the perlfunc manpage. =head2 use() use(), just like require(), loads and compiles files containing Perl code, but it works with L<modules|general::perl_reference::perl_reference/Modules__Libraries_and_Program_Files> only and is executed at compile time. The only way to pass a module to load is by its module name and not its filename. If the module is located in I<MyCode.pm>, the correct way to use() it is: use MyCode and not: use "MyCode.pm" use() translates the passed argument into a file name replacing C<::> with the operating system's path separator (normally C</>) and appending I<.pm> at the end. So C<My::Module> becomes I<My/Module.pm>. use() is exactly equivalent to: BEGIN { require Module; Module->import(LIST); } Internally it calls require() to do the loading and compilation chores. When require() finishes its job, import() is called unless C<()> is the second argument. The following pairs are equivalent: use MyModule; BEGIN {require MyModule; MyModule->import; } use MyModule qw(foo bar); BEGIN {require MyModule; MyModule->import("foo","bar"); } use MyModule (); BEGIN {require MyModule; } The first pair exports the default tags. This happens if the module sets C<@EXPORT> to a list of tags to be exported by default. The module's manpage normally describes what tags are exported by default. The second pair exports only the tags passed as arguments. The third pair describes the case where the caller does not want any symbols to be imported. C<import()> is not a builtin function, it's just an ordinary static method call into the "C<MyModule>" package to tell the module to import the list of features back into the current package. See the Exporter manpage for more information. When you write your own modules, always remember that it's better to use C<@EXPORT_OK> instead of C<@EXPORT>, since the former doesn't export symbols unless it was asked to. Exports pollute the namespace of the module user. Also avoid short or common symbol names to reduce the risk of name clashes. When functions and variables aren't exported you can still access them using their full names, like C<$My::Module::bar> or C<$My::Module::foo()>. By convention you can use a leading underscore on names to informally indicate that they are I<internal> and not for public use. There's a corresponding "C<no>" command that un-imports symbols imported by C<use>, i.e., it calls C<Module-E<gt>unimport(LIST)> instead of C<import()>. =head2 do() While do() behaves almost identically to require(), it reloads the file unconditionally. It doesn't check C<%INC> to see whether the file was already loaded. If do() cannot read the file, it returns C<undef> and sets C<$!> to report the error. If do() can read the file but cannot compile it, it returns C<undef> and puts an error message in C<$@>. If the file is successfully compiled, do() returns the value of the last expression evaluated. =head1 Using Global Variables and Sharing Them Between Modules/Packages It helps when you code your application in a structured way, using the perl packages, but as you probably know once you start using packages it's much harder to share the variables between the various packagings. A configuration package comes to mind as a good example of the package that will want its variables to be accessible from the other modules. Of course using the Object Oriented (OO) programming is the best way to provide an access to variables through the access methods. But if you are not yet ready for OO techniques you can still benefit from using the techniques we are going to talk about. =head2 Making Variables Global When you first wrote C<$x> in your code you created a (package) global variable. It is visible everywhere in your program, although if used in a package other than the package in which it was declared (C<main::> by default), it must be referred to with its fully qualified name, unless you have imported this variable with import(). This will work only if you do not use C<strict> pragma; but you I<have> to use this pragma if you want to run your scripts under mod_perl. Read L<The strict pragma|guide::porting/The_strict_pragma> to find out why. =head2 Making Variables Global With strict Pragma On First you use : use strict; Then you use: use vars qw($scalar %hash @array); This declares the named variables as package globals in the current package. They may be referred to within the same file and package with their unqualified names; and in different files/packages with their fully qualified names. With perl5.6 you can use the C<our> operator instead: our($scalar, %hash, @array); If you want to share package global variables between packages, here is what you can do. =head2 Using Exporter.pm to Share Global Variables Assume that you want to share the C<CGI.pm> object (I will use C<$q>) between your modules. For example, you create it in C<script.pl>, but you want it to be visible in C<My::HTML>. First, you make C<$q> global. script.pl: ---------------- use vars qw($q); use CGI; use lib qw(.); use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl $q = CGI->new; My::HTML::printmyheader(); Note that we have imported C<$q> from C<My::HTML>. And C<My::HTML> does the export of C<$q>: My/HTML.pm ---------------- package My::HTML; use strict; BEGIN { use Exporter (); @My::HTML::ISA = qw(Exporter); @My::HTML::EXPORT = qw(); @My::HTML::EXPORT_OK = qw($q); } use vars qw($q); sub printmyheader{ # Whatever you want to do with $q... e.g. print $q->header(); } 1; So the C<$q> is shared between the C<My::HTML> package and C<script.pl>. It will work vice versa as well, if you create the object in C<My::HTML> but use it in C<script.pl>. You have true sharing, since if you change C<$q> in C<script.pl>, it will be changed in C<My::HTML> as well. What if you need to share C<$q> between more than two packages? For example you want My::Doc to share C<$q> as well. You leave C<My::HTML> untouched, and modify I<script.pl> to include: use My::Doc qw($q); Then you add the same C<Exporter> code that we used in C<My::HTML>, into C<My::Doc>, so that it also exports C<$q>. One possible pitfall is when you want to use C<My::Doc> in both C<My::HTML> and I<script.pl>. Only if you add use My::Doc qw($q); into C<My::HTML> will C<$q> be shared. Otherwise C<My::Doc> will not share C<$q> any more. To make things clear here is the code: script.pl: ---------------- use vars qw($q); use CGI; use lib qw(.); use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl use My::Doc qw($q); # Ditto $q = new CGI; My::HTML::printmyheader(); My/HTML.pm ---------------- package My::HTML; use strict; BEGIN { use Exporter (); @My::HTML::ISA = qw(Exporter); @My::HTML::EXPORT = qw(); @My::HTML::EXPORT_OK = qw($q); } use vars qw($q); use My::Doc qw($q); sub printmyheader{ # Whatever you want to do with $q... e.g. print $q->header(); My::Doc::printtitle('Guide'); } 1; My/Doc.pm ---------------- package My::Doc; use strict; BEGIN { use Exporter (); @My::Doc::ISA = qw(Exporter); @My::Doc::EXPORT = qw(); @My::Doc::EXPORT_OK = qw($q); } use vars qw($q); sub printtitle{ my $title = shift || 'None'; print $q->h1($title); } 1; =head2 Using the Perl Aliasing Feature to Share Global Variables As the title says you can import a variable into a script or module without using C<Exporter.pm>. I have found it useful to keep all the configuration variables in one module C<My::Config>. But then I have to export all the variables in order to use them in other modules, which is bad for two reasons: polluting other packages' name spaces with extra tags which increases the memory requirements; and adding the overhead of keeping track of what variables should be exported from the configuration module and what imported, for some particular package. I solve this problem by keeping all the variables in one hash C<%c> and exporting that. Here is an example of C<My::Config>: package My::Config; use strict; use vars qw(%c); %c = ( # All the configs go here scalar_var => 5, array_var => [qw(foo bar)], hash_var => { foo => 'Foo', bar => 'BARRR', }, ); 1; Now in packages that want to use the configuration variables I have either to use the fully qualified names like C<$My::Config::test>, which I dislike or import them as described in the previous section. But hey, since we have only one variable to handle, we can make things even simpler and save the loading of the C<Exporter.pm> package. We will use the Perl aliasing feature for exporting and saving the keystrokes: package My::HTML; use strict; use lib qw(.); # Global Configuration now aliased to global %c use My::Config (); # My/Config.pm in the same dir as script.pl use vars qw(%c); *c = \%My::Config::c; # Now you can access the variables from the My::Config print $c{scalar_var}; print $c{array_var}[0]; print $c{hash_var}{foo}; Of course $c is global everywhere you use it as described above, and if you change it somewhere it will affect any other packages you have aliased C<$My::Config::c> to. Note that aliases work either with global or C<local()> vars - you cannot write: my *c = \%My::Config::c; # ERROR! Which is an error. But you can write: local *c = \%My::Config::c; For more information about aliasing, refer to the Camel book, second edition, pages 51-52. =head2 Using Non-Hardcoded Configuration Module Names You have just seen how to use a configuration module for configuration centralization and an easy access to the information stored in this module. However, there is somewhat of a chicken-and-egg problem--how to let your other modules know the name of this file? Hardcoding the name is brittle--if you have only a single project it should be fine, but if you have more projects which use different configurations and you will want to reuse their code you will have to find all instances of the hardcoded name and replace it. Another solution could be to have the same name for a configuration module, like C<My::Config> but putting a different copy of it into different locations. But this won't work under mod_perl because of the namespace collision. You cannot load different modules which uses the same name, only the first one will be loaded. Luckily, there is another solution which allows us to stay flexible. C<PerlSetVar> comes to rescue. Just like with environment variables, you can set server's global Perl variables which can be retrieved from any module and script. Those statements are placed into the I<httpd.conf> file. For example PerlSetVar FooBaseDir /home/httpd/foo PerlSetVar FooConfigModule Foo::Config Now we require() the file where the above configuration will be used. PerlRequire /home/httpd/perl/startup.pl In the I<startup.pl> we might have the following code: # retrieve the configuration module path use Apache; my $s = Apache->server; my $base_dir = $s->dir_config('FooBaseDir') || ''; my $config_module = $s->dir_config('FooConfigModule') || ''; die "FooBaseDir and FooConfigModule aren't set in httpd.conf" unless $base_dir and $config_module; # build the real path to the config module my $path = "$base_dir/$config_module"; $path =~ s|::|/|; $path .= ".pm"; # we have something like "/home/httpd/foo/Foo/Config.pm" # now we can pull in the configuration module require $path; Now we know the module name and it's loaded, so for example if we need to use some variables stored in this module to open a database connection, we will do: Apache::DBI->connect_on_init ("DBI:mysql:${$config_module.'::DB_NAME'}::${$config_module.'::SERVER'}", ${$config_module.'::USER'}, ${$config_module.'::USER_PASSWD'}, { PrintError => 1, # warn() on errors RaiseError => 0, # don't die on error AutoCommit => 1, # commit executes immediately } ); Where variable like: ${$config_module.'::USER'} In our example are really: $Foo::Config::USER If you want to access these variable from within your code at the run time, instead accessing to the server object C<$c>, use the request object C<$r>: my $r = shift; my $base_dir = $r->dir_config('FooBaseDir') || ''; my $config_module = $r->dir_config('FooConfigModule') || ''; =head1 The Scope of the Special Perl Variables Special Perl variables like C<$|> (buffering), C<$^T> (script's start time), C<$^W> (warnings mode), C<$/> (input record separator), C<$\> (output record separator) and many more are all true global variables; they do not belong to any particular package (not even C<main::>) and are universally available. This means that if you change them, you change them anywhere across the entire program; furthermore you cannot scope them with my(). However you can local()ise them which means that any changes you apply will only last until the end of the enclosing scope. In the mod_perl situation where the child server doesn't usually exit, if in one of your scripts you modify a global variable it will be changed for the rest of the process' life and will affect all the scripts executed by the same process. Therefore localizing these variables is highly recommended, I'd say mandatory. We will demonstrate the case on the input record separator variable. If you undefine this variable, the diamond operator (readline) will suck in the whole file at once if you have enough memory. Remembering this you should never write code like the example below. $/ = undef; # BAD! open IN, "file" .... # slurp it all into a variable $all_the_file = <IN>; The proper way is to have a local() keyword before the special variable is changed, like this: local $/ = undef; open IN, "file" .... # slurp it all inside a variable $all_the_file = <IN>; But there is a catch. local() will propagate the changed value to the code below it. The modified value will be in effect until the script terminates, unless it is changed again somewhere else in the script. A cleaner approach is to enclose the whole of the code that is affected by the modified variable in a block, like this: { local $/ = undef; open IN, "file" .... # slurp it all inside a variable $all_the_file = <IN>; } That way when Perl leaves the block it restores the original value of the C<$/> variable, and you don't need to worry elsewhere in your program about its value being changed here. Note that if you call a subroutine after you've set a global variable but within the enclosing block, the global variable will be visible with its new value inside the subroutine. =head1 Compiled Regular Expressions When using a regular expression that contains an interpolated Perl variable, if it is known that the variable (or variables) will not change during the execution of the program, a standard optimization technique is to add the C</o> modifier to the regex pattern. This directs the compiler to build the internal table once, for the entire lifetime of the script, rather than every time the pattern is executed. Consider: my $pat = '^foo$'; # likely to be input from an HTML form field foreach( @list ) { print if /$pat/o; } This is usually a big win in loops over lists, or when using the C<grep()> or C<map()> operators. In long-lived mod_perl scripts, however, the variable may change with each invocation and this can pose a problem. The first invocation of a fresh httpd child will compile the regex and perform the search correctly. However, all subsequent uses by that child will continue to match the original pattern, regardless of the current contents of the Perl variables the pattern is supposed to depend on. Your script will appear to be broken. There are two solutions to this problem: The first is to use C<eval q//>, to force the code to be evaluated each time. Just make sure that the eval block covers the entire loop of processing, and not just the pattern match itself. The above code fragment would be rewritten as: my $pat = '^foo$'; eval q{ foreach( @list ) { print if /$pat/o; } } Just saying: foreach( @list ) { eval q{ print if /$pat/o; }; } means that we recompile the regex for every element in the list even though the regex doesn't change. You can use this approach if you require more than one pattern match operator in a given section of code. If the section contains only one operator (be it an C<m//> or C<s///>), you can rely on the property of the null pattern, that reuses the last pattern seen. This leads to the second solution, which also eliminates the use of eval. The above code fragment becomes: my $pat = '^foo$'; "something" =~ /$pat/; # dummy match (MUST NOT FAIL!) foreach( @list ) { print if //; } The only gotcha is that the dummy match that boots the regular expression engine must absolutely, positively succeed, otherwise the pattern will not be cached, and the C<//> will match everything. If you can't count on fixed text to ensure the match succeeds, you have two possibilities. If you can guarantee that the pattern variable contains no meta-characters (things like *, +, ^, $...), you can use the dummy match: $pat =~ /\Q$pat\E/; # guaranteed if no meta-characters present If there is a possibility that the pattern can contain meta-characters, you should search for the pattern or the non-searchable \377 character as follows: "\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present Another approach: It depends on the complexity of the regex to which you apply this technique. One common usage where a compiled regex is usually more efficient is to "I<match any one of a group of patterns>" over and over again. Maybe with a helper routine, it's easier to remember. Here is one slightly modified from Jeffery Friedl's example in his book "I<Mastering Regular Expressions>". ##################################################### # Build_MatchMany_Function # -- Input: list of patterns # -- Output: A code ref which matches its $_[0] # against ANY of the patterns given in the # "Input", efficiently. # sub Build_MatchMany_Function { my @R = @_; my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R ); my $matchsub = eval "sub { $expr }"; die "Failed in building regex @R: $@" if $@; $matchsub; } Example usage: @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww); $Known_Browser=Build_MatchMany_Function(@some_browsers); while (<ACCESS_LOG>) { # ... $browser = get_browser_field($_); if ( ! &$Known_Browser($browser) ) { print STDERR "Unknown Browser: $browser\n"; } # ... } And of course you can use the qr() operator which makes the code even more efficient: my $pat = '^foo$'; my $re = qr($pat); foreach( @list ) { print if /$re/o; } The qr() operator compiles the pattern for each request and then use the compiled version in the actual match. =head1 Exception Handling for mod_perl Here are some guidelines for S<clean(er)> exception handling in mod_perl, although the technique presented can be applied to all of your Perl programming. The reasoning behind this document is the current broken status of C<$SIG{__DIE__}> in the perl core - see both the perl5-porters and the mod_perl mailing list archives for details on this discussion. (It's broken in at least Perl v5.6.0 and probably in later versions as well). In short summary, $SIG{__DIE__} is a little bit too global, and catches exceptions even when you want to catch them yourself, using an C<eval{}> block. =head2 Trapping Exceptions in Perl To trap an exception in Perl we use the C<eval{}> construct. Many people initially make the mistake that this is the same as the C<eval EXPR> construct, which compiles and executes code at run time, but that's not the case. C<eval{}> compiles at compile time, just like the rest of your code, and has next to zero run-time penalty. For the hardcore C programmers among you, it uses the C<setjmp/longjmp> POSIX routines internally, just like C++ exceptions. When in an eval block, if the code being executed die()'s for any reason, an exception is thrown. This exception can be caught by examining the C<$@> variable immediately after the eval block; if C<$@> is true then an exception occurred and C<$@> contains the exception in the form of a string. The full construct looks like this: eval { # Some code here }; # Note important semi-colon there if ($@) # $@ contains the exception that was thrown { # Do something with the exception } else # optional { # No exception was thrown } Most of the time when you see these exception handlers there is no else block, because it tends to be OK if the code didn't throw an exception. Perl's exception handling is similar to that of other languages, though it may not seem so at first sight: Perl Other language ------------------------------- ------------------------------------ eval { try { # execute here // execute here # raise our own exception: // raise our own exception: die "Oops" if /error/; if(error==1){throw Exception.Oops;} # execute more // execute more } ; } if($@) { catch { # handle exceptions switch( Exception.id ) { if( $@ =~ /Fail/ ) { Fail : fprintf( stderr, "Failed\n" ) ; print "Failed\n" ; break ; } elsif( $@ =~ /Oops/ ) { Oops : throw Exception ; # Pass it up the chain die if $@ =~ /Oops/; } else { default : # handle all other } # exceptions here } } // If we got here all is OK or handled } else { # optional # all is well } # all is well or has been handled =head2 Alternative Exception Handling Techniques An often suggested method for handling global exceptions in mod_perl, and other perl programs in general, is a B<__DIE__> handler, which can be set up by either assigning a function name as a string to C<$SIG{__DIE__}> (not particularly recommended, because of the possible namespace clashes) or assigning a code reference to C<$SIG{__DIE__}>. The usual way of doing so is to use an anonymous subroutine: $SIG{__DIE__} = sub { print "Eek - we died with:\n", $_[0]; }; The current problem with this is that C<$SIG{__DIE__}> is a global setting in your script, so while you can potentially hide away your exceptions in some external module, the execution of C<$SIG{__DIE__}> is fairly magical, and interferes not just with your code, but with all code in every module you import. Beyond the magic involved, C<$SIG{__DIE__}> actually interferes with perl's normal exception handling mechanism, the C<eval{}> construct. Witness: $SIG{__DIE__} = sub { print "handler\n"; }; eval { print "In eval\n"; die "Failed for some reason\n"; }; if ($@) { print "Caught exception: $@"; } The code unfortunately prints out: In eval handler Which isn't quite what you would expect, especially if that C<$SIG{__DIE__}> handler is hidden away deep in some other module that you didn't know about. There are work arounds however. One is to localize C<$SIG{__DIE__}> in every exception trap you write: eval { local $SIG{__DIE__}; ... }; Obviously this just doesn't scale - you don't want to be doing that for every exception trap in your code, and it's a slow down. A second work around is to check in your handler if you are trying to catch this exception: $SIG{__DIE__} = sub { die $_[0] if $^S; print "handler\n"; }; However this won't work under C<Apache::Registry> - you're always in an eval block there! C<$^S> isn't totally reliable in certain Perl versions. e.g. 5.005_03 and 5.6.1 both do the wrong thing with it in certain situations. Instead, you use can use the caller() function to figure out if we are called in the eval() context: $SIG{__DIE__} = sub { my $in_eval = 0; for(my $stack = 1; my $sub = (CORE::caller($stack))[3]; $stack++) { $in_eval = 1 if $sub =~ /^\(eval\)/; } my_die_handler(@_) unless $in_eval; }; The other problem with C<$SIG{__DIE__}> also relates to its global nature. Because you might have more than one application running under mod_perl, you can't be sure which has set a C<$SIG{__DIE__}> handler when and for what. This can become extremely confusing when you start scaling up from a set of simple registry scripts that might rely on CGI::Carp for global exception handling (which uses C<$SIG{__DIE__}> to trap exceptions) to having many applications installed with a variety of exception handling mechanisms in place. You should warn people about this danger of C<$SIG{__DIE__}> and inform them of better ways to code. The following material is an attempt to do just that. =head2 Better Exception Handling The C<eval{}> construct in itself is a fairly weak way to handle exceptions as strings. There's no way to pass more information in your exception, so you have to handle your exception in more than one place - at the location the error occurred, in order to construct a sensible error message, and again in your exception handler to de-construct that string into something meaningful (unless of course all you want your exception handler to do is dump the error to the browser). The other problem is that you have no way of automatically detecting where the exception occurred using C<eval{}> construct. In a C<$SIG{__DIE__}> block you always have the use of the caller() function to detect where the error occurred. But we can fix that... A little known fact about exceptions in perl 5.005 is that you can call die with an object. The exception handler receives that object in C<$@>. This is how you are advised to handle exceptions now, as it provides an extremely flexible and scalable exceptions solution, potentially providing almost all of the power Java exceptions. [As a footnote here, the only thing that is really missing here from Java exceptions is a guaranteed Finally clause, although its possible to get about 98.62% of the way towards providing that using C<eval{}>.] =head3 A Little Housekeeping First though, before we delve into the details, a little housekeeping is in order. Most, if not all, mod_perl programs consist of a main routine that is entered, and then dispatches itself to a routine depending on the parameters passed and/or the form values. In a normal C program this is your main() function, in a mod_perl handler this is your handler() function/method. The exception to this rule seems to be Apache::Registry scripts, although the techniques described here can be easily adapted. In order for you to be able to use exception handling to its best advantage you need to change your script to have some sort of global exception handling. This is much more trivial than it sounds. If you're using C<Apache::Registry> to emulate CGI you might consider wrapping your entire script in one big eval block, but I would discourage that. A better method would be to modularize your script into discrete function calls, one of which should be a dispatch routine: #!/usr/bin/perl -w # Apache::Registry script eval { dispatch(); }; if ($@) { # handle exception } sub dispatch { ... } This is easier with an ordinary mod_perl handler as it is natural to have separate functions, rather than a long run-on script: MyHandler.pm ------------ sub handler { my $r = shift; eval { dispatch($r); }; if ($@) { # handle exception } } sub dispatch { my $r = shift; ... } Now that the skeleton code is setup, let's create an exception class, making use of Perl 5.005's ability to throw exception objects. =head3 An Exception Class This is a really simple exception class, that does nothing but contain information. A better implementation would probably also handle its own exception conditions, but that would be more complex, requiring separate packages for each exception type. My/Exception.pm --------------- package My::Exception; sub AUTOLOAD { no strict 'refs', 'subs'; if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) { my $exception = $1; *{$AUTOLOAD} = sub { shift; my ($package, $filename, $line) = caller; push @_, caller => { package => $package, filename => $filename, line => $line, }; bless { @_ }, "My::Exception::$exception"; }; goto &{$AUTOLOAD}; } else { die "No such exception class: $AUTOLOAD\n"; } } 1; OK, so this is all highly magical, but what does it do? It creates a simple package that we can import and use as follows: use My::Exception; die My::Exception->SomeException( foo => "bar" ); The exception class tracks exactly where we died from using the caller() mechanism, it also caches exception classes so that C<AUTOLOAD> is only called the first time (in a given process) an exception of a particular type is thrown (particularly relevant under mod_perl). =head2 Catching Uncaught Exceptions What about exceptions that are thrown outside of your control? We can fix this using one of two possible methods. The first is to override die globally using the old magical C<$SIG{__DIE__}>, and the second, is the cleaner non-magical method of overriding the global die() method to your own die() method that throws an exception that makes sense to your application. =head3 Using $SIG{__DIE__} Overloading using C<$SIG{__DIE__}> in this case is rather simple, here's some code: $SIG{__DIE__} = sub { if(!ref($_[0])) { $err = My::Exception->UnCaught(text => join('', @_)); } die $err; }; All this does is catch your exception and re-throw it. It's not as dangerous as we stated earlier that C<$SIG{__DIE__}> can be, because we're actually re-throwing the exception, rather than catching it and stopping there. Even though $SIG{__DIE__} is a global handler, because we are simply re-throwing the exception we can let other applications outside of our control simply catch the exception and not worry about it. There's only one slight buggette left, and that's if some external code die()'ing catches the exception and tries to do string comparisons on the exception, as in: eval { ... # some code die "FATAL ERROR!\n"; }; if ($@) { if ($@ =~ /^FATAL ERROR/) { die $@; } } In order to deal with this, we can overload stringification for our C<My::Exception::UnCaught> class: { package My::Exception::UnCaught; use overload '""' => \&str; sub str { shift->{text}; } } We can now let other code happily continue. Note that there is a bug in Perl 5.6 which may affect people here: Stringification does not occur when an object is operated on by a regular expression (via the =~ operator). A work around is to explicitly stringify using qq double quotes, however that doesn't help the poor soul who is using other applications. This bug has been fixed in later versions of Perl. =head3 Overriding the Core die() Function So what if we don't want to touch C<$SIG{__DIE__}> at all? We can overcome this by overriding the core die function. This is slightly more complex than implementing a C<$SIG{__DIE__}> handler, but is far less magical, and is the right thing to do, according to the L<perl5-porters mailing list|guide::help/Get_help_with_Perl>. Overriding core functions has to be done from an external package/module. So we're going to add that to our C<My::Exception> module. Here's the relevant parts: use vars qw/@ISA @EXPORT/; use Exporter; @EXPORT = qw/die/; @ISA = 'Exporter'; sub die (@); # prototype to match CORE::die sub import { my $pkg = shift; $pkg->export('CORE::GLOBAL', 'die'); Exporter::import($pkg,@_); } sub die (@) { if (!ref($_[0])) { CORE::die My::Exception->UnCaught(text => join('', @_)); } CORE::die $_[0]; # only use first element because its an object } That wasn't so bad, was it? We're relying on Exporter's export() function to do the hard work for us, exporting the die() function into the C<CORE::GLOBAL> namespace. If we don't want to overload die() everywhere this can still be an extremely useful technique. By just using Exporter's default import() method we can export our new die() method into any package of our choosing. This allows us to short-cut the long calling convention and simply die() with a string, and let the system handle the actual construction into an object for us. Along with the above overloaded stringification, we now have a complete exception system (well, mostly complete. Exception die-hards would argue that there's no "finally" clause, and no exception stack, but that's another topic for another time). =head2 A Single UnCaught Exception Class Until the Perl core gets its own base exception class (which will likely happen for Perl 6, but not sooner), it is vitally important that you decide upon a single base exception class for all of the applications that you install on your server, and a single exception handling technique. The problem comes when you have multiple applications all doing exception handling and all expecting a certain type of "UnCaught" exception class. Witness the following application: package Foo; eval { # do something } if ($@) { if ([EMAIL PROTECTED]>isa('Foo::Exception::Bar')) { # handle "Bar" exception } elsif ([EMAIL PROTECTED]>isa('Foo::Exception::UnCaught')) { # handle uncaught exceptions } } All will work well until someone installs application "TrapMe" on the same machine, which installs its own UnCaught exception handler, overloading CORE::GLOBAL::die or installing a $SIG{__DIE__} handler. This is actually a case where using $SIG{__DIE__} might actually be preferable, because you can change your handler() routine to look like this: sub handler { my $r = shift; local $SIG{__DIE__}; Foo::Exception->Init(); # sets $SIG{__DIE__} eval { dispatch($r); }; if ($@) { # handle exception } } sub dispatch { my $r = shift; ... } In this case the very nature of $SIG{__DIE__} being a lexical variable has helped us, something we couldn't fix with overloading CORE::GLOBAL::die. However there is still a gotcha. If someone has overloaded die() in one of the applications installed on your mod_perl machine, you get the same problems still. So in short: Watch out, and check the source code of anything you install to make sure it follows your exception handling technique, or just uses die() with strings. =head2 Some Uses I'm going to come right out and say now: I abuse this system horribly! I throw exceptions all over my code, not because I've hit an "exceptional" bit of code, but because I want to get straight back out of the current call stack, without having to have every single level of function call check error codes. One way I use this is to return Apache return codes: # paranoid security check die My::Exception->RetCode(code => 204); Returns a 204 error code (C<HTTP_NO_CONTENT>), which is caught at my top level exception handler: if ([EMAIL PROTECTED]>isa('My::Exception::RetCode')) { return [EMAIL PROTECTED]>{code}; } That last return statement is in my handler() method, so that's the return code that Apache actually sends. I have other exception handlers in place for sending Basic Authentication headers and Redirect headers out. I also have a generic C<My::Exception::OK> class, which gives me a way to back out completely from where I am, but register that as an OK thing to do. Why do I go to these extents? After all, code like slashcode (the code behind http://slashdot.org) doesn't need this sort of thing, so why should my web site? Well it's just a matter of scalability and programmer style really. There's a lot of literature out there about exception handling, so I suggest doing some research. =head2 Conclusions Here I've demonstrated a simple and scalable (and useful) exception handling mechanism, that fits perfectly with your current code, and provides the programmer with an excellent means to determine what has happened in his code. Some users might be worried about the overhead of such code. However in use I've found accessing the database to be a much more significant overhead, and this is used in some code delivering to thousands of users. For similar exception handling techniques, see the section "L<Other Implementations|general::perl_reference::perl_reference/Other_Implementations>". =head2 The My::Exception class in its entirety package My::Exception; use vars qw/@ISA @EXPORT $AUTOLOAD/; use Exporter; @ISA = 'Exporter'; @EXPORT = qw/die/; sub die (@); sub import { my $pkg = shift; # allow "use My::Exception 'die';" to mean import locally only $pkg->export('CORE::GLOBAL', 'die') unless @_; Exporter::import($pkg,@_); } sub die (@) { if (!ref($_[0])) { CORE::die My::Exception->UnCaught(text => join('', @_)); } CORE::die $_[0]; } { package My::Exception::UnCaught; use overload '""' => sub { shift->{text} } ; } sub AUTOLOAD { no strict 'refs', 'subs'; if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) { my $exception = $1; *{$AUTOLOAD} = sub { shift; my ($package, $filename, $line) = caller; push @_, caller => { package => $package, filename => $filename, line => $line, }; bless { @_ }, "My::Exception::$exception"; }; goto &{$AUTOLOAD}; } else { CORE::die "No such exception class: $AUTOLOAD\n"; } } 1; =head2 Other Implementations Some users might find it very useful to have the more C++/Java like interface of try/catch functions. These are available in several forms that all work in slightly different ways. See the documentation for each module for details: =over =item * Error.pm Graham Barr's excellent OO styled "try, throw, catch" module (from L<CPAN|download::third_party/Perl>). This should be considered your best option for structured exception handling because it is well known and well supported and used by a lot of other applications. =item * Exception::Class and Devel::StackTrace by Dave Rolsky both available from CPAN of course. C<Exception::Class> is a bit cleaner than the C<AUTOLOAD> method from above as it can catch typos in exception class names, whereas the method above will automatically create a new class for you. In addition, it lets you create actual class hierarchies for your exceptions, which can be useful if you want to create exception classes that provide extra methods or data. For example, an exception class for database errors could provide a method for returning the SQL and bound parameters in use at the time of the error. =item * Try.pm Tony Olekshy's. Adds an unwind stack and some other interesting features. Not on the CPAN. Available at http://www.avrasoft.com/perl/rfc/try-1136.zip =back =head1 Customized __DIE__ hanlder As we saw in the previous sections it's a bad idea to do: require Carp; $SIG{__DIE__} = \&Carp::confess; since it breaks the error propogations within eval {} blocks,. But starting from perl 5.6.x you can use another solution to trace errors. For example you get an error: "exit" is not exported by the GLOB(0x88414cc) module at (eval 397) line 1 and you have no clue where it comes from, you can override the exit() function and plug the tracer inside: require Carp; use subs qw(CORE::GLOBAL::die); *CORE::GLOBAL::die = sub { if ($_[0] =~ /"exit" is not exported/){ local *CORE::GLOBAL::die = sub { CORE::die(@_) }; Carp::confess(@_); # Carp uses die() internally! } else { CORE::die(@_); # could write &CORE::die to forward @_ } }; Now we can test that it works properly without breaking the eval {} blocks error propogation: eval { foo(); }; warn $@ if $@; print "\n"; eval { poo(); }; warn $@ if $@; sub foo{ bar(); } sub bar{ die qq{"exit" is not exported}} sub poo{ tar(); } sub tar{ die "normal exit"} prints: $ perl -w test Subroutine die redefined at test line 5. "exit" is not exported at test line 6 main::__ANON__('"exit" is not exported') called at test line 17 main::bar() called at test line 16 main::foo() called at test line 12 eval {...} called at test line 12 normal exit at test line 5. the 'local' in: local *CORE::GLOBAL::die = sub { CORE::die(@_) }; is important, so you won't lose the overloaded C<CORE::GLOBAL::die>. =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =item * Matt Sergeant E<lt>matt (at) sergeant.orgE<gt> =back Only the major authors are listed above. For contributors see the Changes file. =cut
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]