perl_reference perl_reference.pod

stas 31 Jul 2002 14:43:18 -0000

stas        2002/07/31 07:43:17

  Added:       src/docs/general/hardware hardware.pod
               src/docs/general/multiuser multiuser.pod
               src/docs/general/perl_myth perl_myth.pod
               src/docs/general/perl_reference perl_reference.pod
  Log:
  give pods their own dirs

  Revision  Changes    Path
  1.1                  modperl-docs/src/docs/general/hardware/hardware.pod

  Index: hardware.pod
  ===================================================================
  =head1 NAME

  Choosing an Operating System and Hardware

  =head1 Description

  Before you use the techniques documented on this site to tune servers
  and write code you need to consider the demands which will be placed on
  the hardware and the operating system.  There is no point in investing
  a lot of time and money in configuration and coding only to find that
  your server's performance is poor because you did not choose a
  suitable platform in the first place.

  While the tips below could apply to many web servers, they are aimed
  primarily at administrators of mod_perl enabled Apache server.

  Because hardware platforms and operating systems are developing
  rapidly (even while you are reading this document), this discussion must
  be in general terms.

  =head1 Choosing an Operating System

  First let's talk about Operating Systems (OSs).

  Most of the time I prefer to use Linux or something from the *BSD
  family.  Although I am personally a Linux devotee, I do not want to
  start yet another OS war.

  I will try to talk about what characteristics and features you should
  be looking for to support an Apache/mod_perl server, then when you
  know what you want from your OS, you can go out and find it.  Visit
  the Web sites of the operating systems you are interested in.  You can
  gauge user's opinions by searching the relevant discussions in
  newsgroups and mailing list archives.  Deja - http://deja.com and
  eGroups - http://egroups.com are good examples.  I will leave this fan
  research to the reader.

  =head2 Stability and Robustness

  Probably the most important features in an OS are stability and
  robustness.  You are in an Internet business.  You do not keep normal
  9am to 5pm working hours like many conventional businesses you know.
  You are open 24 hours a day.  You cannot afford to be off-line, for
  your customers will go shop at another service like yours (unless you
  have a monopoly :).  If the OS of your choice crashes every day, first
  do a little investigation.  There might be a simple reason which you
  can find and fix.  There are OSs which won't work unless you reboot
  them twice a day.  You don't want to use the OS of this kind, no
  matter how good the OS' vendor sales department.  Do not follow flushy
  advertisements, follow developers advices instead.

  Generally, people who have used the OS for some time can tell you a
  lot about its stability.  Ask them.  Try to find people who are doing
  similar things to what you are planning to do, they may even be using
  the same software.  There are often compatibility issues to resolve.
  You may need to become familiar with patching and compiling your OS.
  It's easy.

  =head2 Memory Management

  You want an OS with a good memory management, some OSs are well known
  as memory hogs.  The same code can use twice as much memory on one OS
  compared to another.  If the size of the mod_perl process is 10Mb and
  you have tens of these running, it definitely adds up!

  =head2 Memory Leaks

  Some OSs and/or their libraries (e.g. C runtime libraries) suffer from
  memory leaks.  A leak is when some process requests a chunk of memory
  for temporary storage, but then does not subsequently release it.  The
  chunk of memory is not then available for any purpose until the
  process which requested it dies.  We cannot afford such leaks.  A
  single mod_perl process sometimes serves thousands of requests before
  it terminates.  So if a leak occurs on every request, the memory
  demands could become huge.  Of course our code can be the cause of the
  memory leaks as well (check out the C<Apache::Leak> module on CPAN).
  Certainly, we can reduce the number of requests to be served over the
  process' life, but that can degrade performance.

  =head2 Sharing Memory

  We want an OS with good memory sharing capabilities.  As we have seen,
  if we preload the modules and scripts at server startup, they are
  shared between the spawned children (at least for a part of a process'
  life - memory pages can become "dirty" and cease to be shared).  This
  feature can reduce memory consumption a lot!

  =head2 Cost and Support

  If we are in a big business we probably do not mind paying another
  $1000 for some fancy OS with bundled support.  But if our resources
  are low, we will look for cheaper and free OSs.  Free does not mean
  bad, it can be quite the opposite.  Free OSs can have the best support
  we can find.  Some do.  It is very easy to understand - most of the
  people are not rich and will try to use a cheaper or free OS first if
  it does the work for them.  Since it really fits their needs, many
  people keep using it and eventually know it well enough to be able to
  provide support for others in trouble.  Why would they do this for
  free?  One reason is for the spirit of the first days of the Internet,
  when there was no commercial Internet and people helped each other,
  because someone helped them in first place.  I was there, I was
  touched by that spirit and I am keen to keep that spirit alive.

  But, let's get back to our world.  We are living in material world,
  and our bosses pay us to keep the systems running.  So if you feel
  that you cannot provide the support yourself and you do not trust the
  available free resources, you must pay for an OS backed by a company,
  and blame them for any problem.  Your boss wants to be able to sue
  someone if the project has a problem caused by the external product
  that is being used in the project.  If you buy a product and the
  company selling it claims support, you have someone to sue or at least
  to put the blame on.

  If we go with Open Source and it fails we do not have someone to
  sue... wrong--in the last years many companies have realized how good
  the Open Source products are and started to provide an official
  support for these products.  So your boss cannot just dismiss your
  suggestion of using an Open Source Operating System.  You can get a
  paid support just like with any other commercial OS vendor.

  Also remember that the less money you spend on OS and Software, the
  more you will be able to spend on faster and stronger hardware.

  =head2 Discontinued Products

  The OSs in this hazard group tend to be developed by a single company
  or organization.

  You might find yourself in a position where you have invested a lot of
  time and money into developing some proprietary software that is
  bundled with the OS you chose (say writing a mod_perl handler which
  takes advantage of some proprietary features of the OS and which will
  not run on any other OS).  Things are under control, the performance
  is great and you sing with happiness on your way to work.  Then, one
  day, the company which supplies your beloved OS goes bankrupt (not
  unlikely nowadays), or they produce a newer incompatible version and
  they will not support the old one (happens all the time).  You are
  stuck with their early masterpiece, no support and no source code!
  What are you going to do?  Invest more money into porting the software
  to another OS...

  Everyone can be hit by this mini-disaster so it is better to check the
  background of the company when making your choice.  Even so you never
  know what will happen tomorrow - in 1980, a company called Tektronix
  did something similar to one of the Guide reviewers with its
  microprocessor development system.  The guy just had to buy another
  system.  He didn't buy it from Tektronix, of course.  The second
  system never really worked very well and the firm he bought it from
  went bust before they ever got around to fixing it.  So in 1982 he
  wrote his own microprocessor development system software.  It didn't
  take long, it works fine, and he's still using it 18 years later.

  Free and Open Source OSs are probably less susceptible to this kind of
  problem.  Development is usually distributed between many companies
  and developers, so if a person who developed a really important part
  of the kernel lost interest in continuing, someone else will pick the
  falling flag and carry on.  Of course if tomorrow some better project
  shows up, developers might migrate there and finally drop the
  development: but in practice people are often given support on older
  versions and helped to migrate to current versions.  Development tends
  to be more incremental than revolutionary, so upgrades are less
  traumatic, and there is usually plenty of notice of the forthcoming
  changes so that you have time to plan for them.

  Of course with the Open Source OSs you can have the source!  So you
  can always have a go yourself, but do not under-estimate the amounts
  of work involved.  There are many, many man-years of work in an OS.

  =head2 OS Releases

  Actively developed OSs generally try to keep pace with the latest
  technology developments, and continually optimize the kernel and other
  parts of the OS to become better and faster.  Nowadays, Internet and
  networking in general are the hottest topics for system developers.
  Sometimes a simple OS upgrade to the latest stable version can save
  you an expensive hardware upgrade.  Also, remember that when you buy
  new hardware, chances are that the latest software will make the most
  of it.

  If a new product supports an old one by virtue of backwards
  compatibility with previous products of the same family, you might not
  reap all the benefits of the new product's features.  Perhaps you get
  almost the same functionality for much less money if you were to buy
  an older model of the same product.

  =head1 Choosing Hardware

  Sometimes the most expensive machine is not the one which provides the
  best performance.  Your demands on the platform hardware are based on
  many aspects and affect many components.  Let's discuss some of them.

  In the discussion we use terms that may be unfamiliar to some readers:

  =over 4

  =item *

  Cluster - a group of machines connected together to perform one big or
  many small computational tasks in a reasonable time.  Clustering can
  also be used to provide 'fail-over' where if one machine fails its
  processes are transferred to another without interruption of service.
  And you may be able to take one of the machines down for maintenance
  (or an upgrade) and keep your service running - the main server will
  simply not dispatch the requests to the machine that was taken down.

  =item *

  Load balancing - users are given the name of one of your machines but
  perhaps it cannot stand the heavy load.  You can use a clustering
  approach to distribute the load over a number of machines.  The
  central server, which users access initially when they type the name
  of your service, works as a dispatcher.  It just redirects requests to
  other machines.  Sometimes the central server also collects the
  results and returns them to the users.  You can get the advantages of
  clustering too.

  There are many load balancing techniques. (See L<High-Availability
  Linux Project|download::third_party/High_Availability_Linux_Project> for more 
info.)

  =item *

  NIC - Network Interface Card. A hardware component that allows to
  connect your machine to the network. It performs packets sending and
  receiving, newer cards can encrypt and decrypt packets and perform
  digital signing and verifying of the such. These are coming in
  different speeds categories varying from 10Mbps to 10Gbps and
  faster. The most used type of the NIC card is the one that implements
  the Ethernet networking protocol.

  =item *

  RAM - Random Access Memory. It's the memory that you have in your
  computer. (Comes in units of 8Mb, 16Mb, 64Mb, 256Mb, etc.)

  =item *

  RAID - Redundant Array of Inexpensive Disks.

  An array of physical disks, usually treated by the operating system as
  one single disk, and often forced to appear that way by the hardware.
  The reason for using RAID is often simply to achieve a high data
  transfer rate, but it may also be to get adequate disk capacity or
  high reliability.  Redundancy means that the system is capable of
  continued operation even if a disk fails.  There are various types of
  RAID array and several different approaches to implementing them.
  Some systems provide protection against failure of more than one drive
  and some (`hot-swappable') systems allow a drive to be replaced
  without even stopping the OS.  See for example the Linux `HOWTO'
  documents Disk-HOWTO, Module-HOWTO and Parallel-Processing-HOWTO.

  =back

  =head2 Machine Strength Demands According to Expected Site Traffic

  If you are building a fan site and you want to amaze your friends with
  a mod_perl guest book, any old 486 machine could do it.  If you are in
  a serious business, it is very important to build a scalable server.
  If your service is successful and becomes popular, the traffic could
  double every few days, and you should be ready to add more resources
  to keep up with the demand.  While we can define the webserver
  scalability more precisely, the important thing is to make sure that
  you can add more power to your webserver(s) without investing much
  additional money in software development (you will need a little
  software effort to connect your servers, if you add more of them).
  This means that you should choose hardware and OSs that can talk to
  other machines and become a part of a cluster.

  On the other hand if you prepare for a lot of traffic and buy a
  monster to do the work for you, what happens if your service doesn't
  prove to be as successful as you thought it would be?  Then you've
  spent too much money, and meanwhile faster processors and other
  hardware components have been released, so you lose.

  Wisdom and prophecy, that's all it takes :)

  =head3 Single Strong Machine vs Many Weaker Machines

  Let's start with a claim that a four years old processor is still very
  powerful and can be put to a good use. Now let's say that for a given
  amount of money you can probably buy either one new very strong
  machine or about ten older but very cheap machines. I claim that with
  ten old machines connected into a cluster and by deploying load
  balancing you will be able to serve about five times more requests
  than with one single new machine.

  Why is that?  Because generally the performance improvement on a new
  machine is marginal while the price is much higher.  Ten machines will
  do faster disk I/O than one single machine, even if the new disk is
  quite a bit faster.  Yes, you have more administration overhead, but
  there is a chance you will have it anyway, for in a short time the new
  machine you have just bought might not stand the load.  Then you will
  have to purchase more equipment and think about how to implement load
  balancing and web server file system distribution anyway.

  Why I'm so convinced?  Look at the busiest services on the Internet:
  search engines, web-email servers and the like -- most of them use a
  clustering approach.  You may not always notice it, because they hide
  the real implementation behind proxy servers.

  =head2 Internet Connection

  You have the best hardware you can get, but the service is still
  crawling.  Make sure you have a fast Internet connection.  Not as fast
  as your ISP claims it to be, but fast as it should be.  The ISP might
  have a very good connection to the Internet, but put many clients on
  the same line.  If these are heavy clients, your traffic will have to
  share the same line and your throughput will suffer.  Think about a
  dedicated connection and make sure it is truly dedicated.  Don't trust
  the ISP, check it!

  The idea of having a connection to B<The Internet> is a little
  misleading.  Many Web hosting and co-location companies have large
  amounts of bandwidth, but still have poor connectivity.  The public
  exchanges, such as MAE-East and MAE-West, frequently become
  overloaded, yet many ISPs depend on these exchanges.

  Private peering means that providers can exchange traffic much
  quicker.

  Also, if your Web site is of global interest, check that the ISP has
  good global connectivity.  If the Web site is going to be visited
  mostly by people in a certain country or region, your server should
  probably be located there.

  Bad connectivity can directly influence your machine's performance.
  Here is a story one of the developers told on the mod_perl mailing
  list:

    What relationship has 10% packet loss on one upstream provider got
    to do with machine memory ?

    Yes.. a lot. For a nightmare week, the box was located downstream of
    a provider who was struggling with some serious bandwidth problems
    of his own... people were connecting to the site via this link, and
    packet loss was such that retransmits and tcp stalls were keeping
    httpd heavies around for much longer than normal.. instead of
    blasting out the data at high or even modem speeds, they would be
    stuck at 1k/sec or stalled out...  people would press stop and
    refresh, httpds would take 300 seconds to timeout on writes to
    no-one.. it was a nightmare.  Those problems didn't go away till I
    moved the box to a place closer to some decent backbones.

    Note that with a proxy, this only keeps a lightweight httpd tied up,
    assuming the page is small enough to fit in the buffers.  If you are
    a busy internet site you always have some slow clients.  This is a
    difficult thing to simulate in benchmark testing, though.

  =head2 I/O Performance

  If your service is I/O bound (does a lot of read/write operations to
  disk) you need a very fast disk, especially if the you need a
  relational database, which are the main I/O stream creators.  So you
  should not spend the money on Video card and monitor!  A cheap card
  and a 14" monochrome monitor are perfectly adequate for a Web server,
  you will probably access it by C<telnet> or C<ssh> most of the time.
  Look for disks with the best price/performance ratio.  Of course, ask
  around and avoid disks that have a reputation for headcrashes and
  other disasters.

  You must think about RAID or similar systems if you have an enormous
  data set to serve (what is an enormous data set nowadays?  Gigabytes,
  Terabytes?) or you expect a really big web traffic.

  Ok, you have a fast disk, what's next?  You need a fast disk
  controller.  There may be one embedded on your computer's motherboard.
  If the controller is not fast enough you should buy a faster one.
  Don't forget that it may be necessary to disable the original
  controller.

  =head2 Memory

  Memory should be well tested.  Many memory test programs are
  practically useless.  Running a busy system for a few weeks without
  ever shutting it down is a pretty good memory test.  If you increase
  the amount of RAM on a well-tested box, use well-tested RAM.

  How much RAM do you need?  Nowadays, the chances are that you will
  hear: "Memory is cheap, the more you buy the better".  But how much is
  enough?  The answer is pretty straightforward: I<you do not want your
  machine to swap>.  When the CPU needs to write something into memory,
  but memory is already full, it takes the least frequently used memory
  pages and swaps them out to disk.  This means you have to bear the
  time penalty of writing the data to disk.  If another process then
  references some of the data which happens to be on one of the pages
  that has just been swapped out, the CPU swaps it back in again,
  probably swapping out some other data that will be needed very shortly
  by some other process.  Carried to the extreme, the CPU and disk start
  to I<thrash> hopelessly in circles, without getting any real work
  done.  The less RAM there is, the more often this scenario arises.
  Worse, you can exhaust swap space as well, and then your troubles
  really start...

  How do you make a decision?  You know the highest rate at which your
  server expects to serve pages and how long it takes on average to
  serve one.  Now you can calculate how many server processes you need.
  If you know the maximum size your servers can grow to, you know how
  much memory you need.  If your OS supports L<memory
  sharing|general::hardware::hardware/Sharing_Memory>, you can make best use of 
this
  feature by preloading the modules and scripts at server startup, and
  so you will need less memory than you have calculated.

  Do not forget that other essential system processes need memory as
  well, so you should plan not only for the Web server, but also take
  into account the other players.  Remember that requests can be queued,
  so you can afford to let your client wait for a few moments until a
  server is available to serve it.  Most of the time your server will
  not have the maximum load, but you should be ready to bear the peaks.
  You need to reserve at least 20% of free memory for peak situations.
  Many sites have crashed a few moments after a big scoop about them was
  posted and an unexpected number of requests suddenly came in.  (This
  is called the Slashdot effect, which was born at http://slashdot.org ).
  If you are about to announce something cool, be aware of the possible
  consequences.

  =head2 CPU

  Make sure that the CPU is operating within its specifications.  Many
  boxes are shipped with incorrect settings for CPU clock speed, power
  supply voltage etc.  Sometimes a cooling fan is not fitted.  It may be
  ineffective because a cable assembly fouls the fan blades.  Like
  faulty RAM, an overheating processor can cause all kinds of strange
  and unpredictable things to happen.  Some CPUs are known to have bugs
  which can be serious in certain circumstances.  Try not to get one of
  them.

  =head2 Bottlenecks

  You might use the most expensive components, but still get bad
  performance.  Why?  Let me introduce an annoying word: bottleneck.

  A machine is an aggregate of many components.  Almost any one of them
  may become a bottleneck.

  If you have a fast processor but a small amount of RAM, the RAM will
  probably be the bottleneck.  The processor will be under-utilized,
  usually it will be waiting for the kernel to swap the memory pages in
  and out, because memory is too small to hold the busiest pages.

  If you have a lot of memory, a fast processor, a fast disk, but a slow
  disk controller, the disk controller will be the bottleneck.  The
  performance will still be bad, and you will have wasted money.

  Use a fast NIC that does not create a bottleneck.  They are cheap.  If
  the NIC is slow, the whole service is slow.  This is a most important
  component, since webservers are much more often network-bound than
  they are disk-bound!

  =head3 Solving Hardware Requirement Conflicts

  It may happen that the combination of software components which you
  find yourself using gives rise to conflicting requirements for the
  optimization of tuning parameters.  If you can separate the components
  onto different machines you may find that this approach (a kind of
  clustering) solves the problem, at much less cost than buying faster
  hardware, because you can tune the machines individually to suit the
  tasks they should perform.

  For example if you need to run a relational database engine and
  mod_perl server, it can be wise to put the two on different machines,
  since while RDBMS need a very fast disk, mod_perl processes need lots
  of memory. So by placing the two on different machines it's easy to
  optimize each machine at separate and satisfy the each software
  components requirements in the best way.

  =head2 Conclusion

  To use your money optimally you have to understand the hardware very
  well, so you will know what to pick.  Otherwise, you should hire a
  knowledgeable hardware consultant and employ them on a regular basis,
  since your needs will probably change as time goes by and your
  hardware will likewise be forced to adapt as well.

  =head1 Maintainers

  Maintainer is the person(s) you should contact with updates,
  corrections and patches.

  =over

  =item *

  Stas Bekman E<lt>stas (at) stason.orgE<gt>

  =back

  =head1 Authors

  =over

  =item *

  Stas Bekman E<lt>stas (at) stason.orgE<gt>

  =back

  Only the major authors are listed above. For contributors see the
  Changes file.

  =cut

  1.1                  modperl-docs/src/docs/general/multiuser/multiuser.pod

  Index: multiuser.pod
  ===================================================================
  =head1 NAME

  mod_perl for ISPs. mod_perl and Virtual Hosts

  =head1 Description

  mod_perl hosting by ISPs: fantasy or reality? This section covers some
  topics that might be of interest to users looking for ISPs to host
  their mod_perl-based website, and ISPs looking for a way to provide
  such services.

  Today, it is a reality: there are a number of ISPs hosting mod_perl,
  although the number of these is not as big as we would have liked it
  to be. To see a list of ISPs that can provide mod_perl hosting, see
  L<ISPs supporting mod_perl|help::isps>.

  =head1 ISPs providing mod_perl services - a fantasy or a reality

  =over 4

  =item *

  You installed mod_perl on your box at home, and you fell in love with
  it.  So now you want to convert your CGI scripts (which currently are
  running on your favorite ISPs machine) to run under mod_perl.  Then
  you discover that your ISP has never heard of mod_perl, or he refuses
  to install it for you.

  =item *

  You are an old sailor in the ISP business, you have seen it all, you
  know how many ISPs are out there and you know that the sales margins
  are too low to keep you happy.  You are looking for some new service
  almost no one else provides, to attract more clients to become your
  users and hopefully to have a bigger slice of the action than your
  competitors.

  =back

  If you are a user asking for a mod_perl service or an ISP considering
  to provide this service, this section should make things clear for
  both of you.

  An ISP has three choices:

  =over 4

  =item 1

  ISPs probably cannot let users run scripts under mod_perl on the main
  server.  There are many reasons for this:

  Scripts might leak memory, due to sloppy programming.  There will not
  be enough memory to run as many servers as required, and clients will
  be not satisfied with the service because it will be slower.

  The question of file permissions is a very important issue: any user
  who is allowed to write and run a CGI script can at least read (if not
  write) any other files that belong to the same user and/or group the
  web server is running as.  Note that L<it's impossible to run
  C<suEXEC> and C<cgiwrap> extensions under
  mod_perl 
1.0|guide::install/Is_it_possible_to_run_mod_perl_enabled_Apache_as_suExec_>.

  Another issue is the security of the database connections.  If you use
  C<Apache::DBI>, by hacking the C<Apache::DBI> code you can pick a
  connection from the pool of cached connections even if it was opened
  by someone else and your scripts are running on the same web server.

  Yet another security issue is a potential compromise of the systems
  via user's code running on the webservers. One of the possible
  solutions here is to use chroot(1) or jail(8) mechanisms which allow
  to run subsystems isolated from the main system. So if a subsystem
  gets compromised the whole system is still safe.

  There are many more things to be aware of so at this time you have to
  say I<No>.

  Of course as an ISP you can run mod_perl internally, without allowing
  your users to map their scripts so that they will run under mod_perl.
  If as a part of your service you provide scripts such as guest books,
  counters etc. which are not available for user modification, you can
  still can have these scripts running very fast.

  =item 2

  But, hey why can't I let my users run their own servers, so I can wash
  my hands of them and don't have to worry about how dirty and sloppy
  their code is (assuming that the users are running their servers under
  their own usernames, to prevent them from stealing code and data from
  each other).

  This option is fine as long as you are not concerned about your new
  systems resource requirements.  If you have even very limited
  experience with mod_perl, you know that mod_perl enabled Apache
  servers while freeing up your CPU and allowing you to run scripts very
  much faster, have huge memory demands (5-20 times that of plain
  Apache).

  The size depends on the code length, the sloppiness of the
  programming, possible memory leaks the code might have and all that
  multiplied by the number of children each server spawns.  A very
  simple example: a server, serving an average number of scripts,
  demanding 10Mb of memory which spawns 10 children, already raises your
  memory requirements by 100Mb (the real requirement is actually much
  smaller if your OS allows code sharing between processes and
  programmers exploit these features in their code).  Now multiply the
  average required size by the number of server users you intend to have
  and you will get the total memory requirement.

  Since ISPs never say I<No>, you'd better take the inverse approach -
  think of the largest memory size you can afford then divide it by one
  user's requirements as I have shown in this example, and you will know
  how many mod_perl users you can afford :)

  But you cannot tell how much memory your users may use?  Their
  requirements from a single server can be very modest, but do you know
  how many servers they will run?  After all, they have full control of
  I<httpd.conf> - and it has to be this way, since this is essential for
  the user running mod_perl.

  All this rumbling about memory leads to a single question: is it
  possible to prevent users from using more than X memory?  Or another
  variation of the question: assuming you have as much memory as you
  want, can you charge users for their average memory usage?

  If the answer to either of the above questions is I<Yes>, you are all
  set and your clients will prize your name for letting them run
  mod_perl!  There are tools to restrict resource usage (see for example
  the man pages for C<ulimit(3)>, C<getrlimit(2)>, C<setrlimit(2)> and
  C<sysconf(3)>, the last three have the corresponding Perl modules:
  C<BSD::Resource> and C<Apache::Resource>).

  [ReaderMETA]: If you have experience with other resource limiting
  techniques please share it with us.  Thank you!

  If you have chosen this option, you have to provide your client with:

  =over 4

  =item *

  Shutdown and startup scripts installed together with the rest of your
  daemon startup scripts (e.g I</etc/rc.d> directory), so that when you
  reboot your machine the user's server will be correctly shutdown and
  will be back online the moment your system starts up.  Also make sure
  to start each server under the username the server belongs to, or you
  are going to be in big trouble!

  =item *

  Proxy services (in forward or httpd accelerator mode) for the user's
  virtual host.  Since the user will have to run their server on an
  unprivileged port (E<gt>1024), you will have to forward all requests
  from C<user.given.virtual.hostname:80> (which is
  C<user.given.virtual.hostname> without the default port 80) to
  C<your.machine.ip:port_assigned_to_user> .  You will also have to tell
  the users to code their scripts so that any self referencing URLs are
  of the form C<user.given.virtual.hostname>.

  Letting the user run a mod_perl server immediately adds a requirement
  for the user to be able to restart and configure their own server.
  Only root can bind to port 80, this is why your users have to use port
  numbers greater than 1024.

  Another solution would be to use a setuid startup script, but think
  twice before you go with it, since if users can modify the scripts
  they will get a root access. For more information refer to the section
  "L<SUID Start-up Scripts|general::control::control/SUID_Start_up_Scripts>".

  =item *

  Another problem you will have to solve is how to assign ports between
  users.  Since users can pick any port above 1024 to run their server,
  you will have to lay down some rules here so that multiple servers do
  not conflict.

  A simple example will demonstrate the importance of this problem: I am
  a malicious user or I am just a rival of some fellow who runs his
  server on your ISP.  All I need to do is to find out what port my
  rival's server is listening to (e.g. using C<netstat(8)>) and
  configure my own server to listen on the same port.  Although I am
  unable to bind to this port, imagine what will happen when you reboot
  your system and my startup script happens to be run before my rival's
  one!  I get the port first, now all requests will be redirected to my
  server.  I'll leave to your imagination what nasty things might happen
  then.

  Of course the ugly things will quickly be revealed, but not before the
  damage has been done.

  Luckily there are special tools that can ensure that users that aren't
  authorized to bind to certain ports (above 1024) won't be able to do
  so. One such a tool is called C<cbs> and its documentation can be
  found at I<http://www.epita.fr/~flav/cbs/doc/html>.

  =back

  Basically you can preassign each user a port, without them having to
  worry about finding a free one, as well as enforce C<MaxClients> and
  similar values by implementing the following scenario:

  For each user have two configuration files, the main file,
  I<httpd.conf> (non-writable by user) and the user's file,
  I<username.httpd.conf> where they can specify their own configuration
  parameters and override the ones defined in I<httpd.conf>.  Here is
  what the main configuration file looks like:

    httpd.conf
    ----------
    # Global/default settings, the user may override some of these
    ...
    ...
    # Included so that user can set his own configuration
    Include username.httpd.conf

    # User-specific settings which will override any potentially 
    # dangerous configuration directives in username.httpd.conf
    ...
    ...

    username.httpd.conf
    -------------------
    # Settings that your user would like to add/override,
    # like <Location> and PerlModule directives, etc.

  Apache reads the global/default settings first.  Then it reads the
  I<Include>'d I<username.httpd.conf> file with whatever settings the
  user has chosen, and finally it reads the user-specific settings that
  we don't want the user to override, such as the port number.  Even if
  the user changes the port number in his I<username.httpd.conf> file,
  Apache reads our settings last, so they take precedence.  Note that
  you can use L<Perl sections|guide::config/Apache_Configuration_in_Perl> to
  make the configuration much easier.

  =item 3

  A much better, but costly solution is I<co-location>.  Let the user
  hook his (or your) stand-alone machine into your network, and forget
  about this user.  Of course either the user or you will have to
  undertake all the system administration chores and it will cost your
  client more money.

  Who are the people who seek mod_perl support?  They are people who run
  serious projects/businesses.  Money is not usually an obstacle.  They
  can afford a stand alone box, thus achieving their goal of autonomy
  whilst keeping their ISP happy.

  =back

  =head2 Virtual Servers Technologies

  As we have just seen one of the obstacles of using mod_perl in ISP
  environments, is the problem of isolating customers using the same
  machine from each other. A number of virtual servers (don't confuse
  with virtual hosts) technologies (both commercial and Open Source)
  exist today. Here are some of them:

  =over

  =item * The User-mode Linux Kernel

  http://user-mode-linux.sourceforge.net/

  User-Mode Linux is a safe, secure way of running Linux versions and
  Linux processes. Run buggy software, experiment with new Linux kernels
  or distributions, and poke around in the internals of Linux, all
  without risking your main Linux setup.

  User-Mode Linux gives you a virtual machine that may have more
  hardware and software virtual resources than your actual, physical
  computer. Disk storage for the virtual machine is entirely contained
  inside a single file on your physical machine. You can assign your
  virtual machine only the hardware access you want it to have. With
  properly limited access, nothing you do on the virtual machine can
  change or damage your real computer, or its software.

  So if you want to completely protect one user from another and
  yourself from your users this might be yet another alternative to the
  solutions suggested at the beginning of this chapter.

  =item * VMWare Technology

  Allows running a few instances of the same or different OSs on the
  same machine. This technology comes in two flavors:

  Open source: http://www.plex86.org/

  Commercial: http://www.vmware.com/

  So you may want to run a separate OS for each of your clients

  =item * freeVSD Technology

  freeVSD (http://www.freevsd.org), an open source project sponsored by
  Idaya Ltd. The software enables ISPs to securely partition their
  physical servers into many I<virtual servers>, each capable of running
  popular hosting applications such as Apache, Sendmail and MySQL.

  =item * S/390 IBM server

  Quoting from: http://www.s390.ibm.com/linux/vif/

  "The S/390 Virtual Image Facility enables you to run tens to hundreds
  of Linux server images on a single S/390 server. It is ideally suited
  for those who want to move Linux and/or UNIX workloads deployed on
  multiple servers onto a single S/390 server, while maintaining the
  same number of distinct server images. This provides centralized
  management and operation of the multiple image environment, reducing
  complexity, easing administration and lowering costs."

  In two words, this a great solution to huge ISPs, as it allows you to
  run hundreds of mod_perl servers while having only one box to
  maintain. The drawback is the price :)

  Check out this scalable mailing list thread for more details from
  those who know:
  http://archive.develooper.com/[EMAIL PROTECTED]/msg00235.html

  =back

  =head1 Virtual Hosts in the guide

  If you are about to use I<Virtual Hosts> you might want to read these
  sections:

  L<Apache Configuration in Perl|guide::config/Apache_Configuration_in_Perl>

  L<Easing the Chores of Configuring Virtual Hosts with
  mod_macro|guide::config/Configuring_Apache___mod_perl_with_mod_macro>

  L<Is There a Way to Provide a Different startup.pl File for Each
  Individual Virtual 
Host|guide::config/Is_There_a_Way_to_Provide_a_Different_startup_pl_File_for_Each_Individual_Virtual_Host>

  L<Is There a Way to Modify @INC on a Per-Virtual-Host or Per-Location

Basis.|guide::config/Is_There_a_Way_to_Modify__INC_on_a_Per_Virtual_Host_or_Per_Location_Basis_>

  L<A Script From One Virtual Host Calls a Script with the Same Path
  From the Other Virtual 
Host|guide::config/A_Script_From_One_Virtual_Host_Calls_a_Script_with_the_Same_Path_From_the_Other_Virtual_Host>

  =head1 Maintainers

  Maintainer is the person(s) you should contact with updates,
  corrections and patches.

  =over

  =item *

  Stas Bekman E<lt>stas (at) stason.orgE<gt>

  =back

  =head1 Authors

  =over

  =item *

  Stas Bekman E<lt>stas (at) stason.orgE<gt>

  =back

  Only the major authors are listed above. For contributors see the
  Changes file.

  =cut

  1.1                  modperl-docs/src/docs/general/perl_myth/perl_myth.pod

  Index: perl_myth.pod
  ===================================================================
  =head1 NAME

  Popular Perl Complaints and Myths

  =head1 Description

  This document tries to explain the myths about Perl and overturn the
  FUD certain bodies try to spread.

  =head1 Abbreviations

  =over 4

  =item *

  B<M> = Misconception or Myth

  =item *

  B<R> = Response

  =back

  =head2 Interpreted vs. Compiled

  =over 4

  =item M:

  Each dynamic perl page hit needs to load the Perl interpreter and
  compile the script, then run it each time a dynamic web page is hit.
  This dramatically decreases performance as well as makes Perl an
  unscalable model since so much overhead is required to search each
  page.

  =item R:

  This myth was true years ago before the advent of mod_perl.  mod_perl
  loads the interpreter once into memory and never needs to load it
  again. Each perl program is only compiled once. The compiled version
  is then kept into memory and used each time the program is run.  In
  this way there is no extra overhead when hitting a mod_perl page.

  =back

  =head3 Interpreted vs. Compiled (More Gory Details)

  =over 4

  =item R:

  Compiled code always has the potential to be faster than interpreted
  code. Ultimately, all interpreted code needs to eventually be converted
  to native instructions at some point, and this is invariably has to be
  done by a compiled application.

  That said, an interpreted language CAN be faster than a comprable
  native application in certain situations, given certain, common
  programming practices. For example, the allocation and de-allocation
  of memory can be a relatively expensive process in a tightly scoped
  compiled language, wheras interpreted languages typically use garbage
  collectors which don't need to do expensive deallocation in a tight
  loop, instead waiting until additional memory is absolutely necessary,
  or for a less computationally intensive period. Of course, using a
  garbage collector in C would eliminate this edge in this situation,
  but where using garbage collectors in C is uncommon, Perl and most
  other interpreted languages have built-in garbage collectors.

  It is also important to point out that few people use the full
  potential of their modern CPU with a single application. Modern CPUs
  are not only more than fast enough to run interpreted code, many
  processors include instruction sets designed to increase the
  performance of interpreted code.

  =back

  =head2 Perl is overly memory intensive making it unscalable

  =over 4

  =item M:

  Each child process needs the Perl interpreter and all code in memory.
  Even with mod_perl httpd processes tend to be overly large, slowing
  performance, and requiring much more hardware.

  =item R: 

  In mod_perl the interpreter is loaded into the parent process and
  shared between the children.  Also, when scripts are loaded into the
  parent and the parent forks a child httpd process, that child shares
  those scripts with the parent.  So while the child may take 6MB of
  memory, 5MB of that might be shared meaning it only really uses 1MB
  per child.  Even 5 MB of memory per child is not uncommon for most web
  applications on other languages.

  Also, most modern operating systems support the concept of shared
  libraries. Perl can be compiled as a shared library, enabling the bulk
  of the perl interpreter to be shared between processes. Some
  executable formats on some platforms (I believe ELF is one such
  format) are able to share entire executable TEXT segments between
  unrelated processes.

  =back

  =head3 More Tuning Advice:

  =over 4

  =item *

  L<Stas Bekman's Performance Guide|guide::performance>

  =back

  =head2 Not enough support, or tools to develop with Perl. (Myth)

  =over 4

  =item R:

  Of all web applications and languages, Perl arguable has the most
  support and tools. B<CPAN> is a central repository of Perl modules
  which are freely downloadable and usually well supported.  There are
  literally thousands of modules which make building web apps in Perl
  much easier.  There are also countless mailing lists of extremely
  responsive Perl experts who usually respond to questions within an
  hour.  There are also a number of Perl development environments to
  make building Perl Web applications easier.  Just to name a few, there
  is C<Apache::ASP>, C<Mason>, C<embPerl>, C<ePerl>, etc...

  =back

  =head2 If Perl scales so well, how come no large sites use it? (myth)

  =over 4

  =item R:

  Actually, many large sites DO use Perl for the bulk of their web
  applications.  Here are some, just as an example: B<e-Toys>,
  B<CitySearch>, B<Internet Movie Database>( http://imdb.com ), B<Value
  Click> ( http://valueclick.com ), B<Paramount Digital Entertainment>,
  B<CMP> ( http://cmpnet.com ), B<HotBot Mail>/B<HotBot Homepages>, and
  B<DejaNews> to name a few.  Even B<Microsoft> has taken interest in
  Perl via http://www.activestate.com/.

  =back

  =head2 Perl even with mod_perl, is always slower then C.

  =over 4

  =item R:

  The Perl engine is written in C. There is no point arguing that Perl
  is faster than C because anything written in Perl could obviously be
  re-written in C. The same holds true for arguing that C is faster than
  assembly.

  There are two issues to consider here.  First of all, many times a web
  application written in Perl B<CAN be faster> than C thanks to the low
  level optimizations in the Perl compiler.  In other words, its easier
  to write poorly written C then well written Perl.  Secondly its
  important to weigh all factors when choosing a language to build a web
  application in.  Time to market is often one of the highest priorities
  in creating a web application. Development in Perl can often be twice
  as fast as in C.  This is mostly due to the differences in the
  language themselves as well as the wealth of free examples and modules
  which speed development significantly.  Perl's speedy development time
  can be a huge competitive advantage.

  =back

  =head2 Java does away with the need for Perl.

  =over 4

  =item M:

  Perl had its place in the past, but now there's Java and Java will
  kill Perl.

  =item R:

  Java and Perl are actually more complimentary languages then
  competitive.  Its widely accepted that server side Java solutions such
  as C<JServ>, C<JSP> and C<JRUN>, are far slower then mod_perl
  solutions (see next myth).  Even so, Java is often used as the front
  end for server side Perl applications.  Unlike Perl, with Java you can
  create advanced client side applications.  Combined with the strength
  of server side Perl these client side Java applications can be made
  very powerful.

  =back

  =head2 Perl can't create advanced client side applications

  =over 4

  =item R:

  True.  There are some client side Perl solutions like PerlScript in
  MSIE 5.0, but all client side Perl requires the user to have the Perl
  interpreter on their local machine.  Most users do not have a Perl
  interpreter on their local machine.  Most Perl programmers who need to
  create an advanced client side application use Java as their client
  side programming language and Perl as the server side solution.

  =back

  =head2 ASP makes Perl obsolete as a web programming language.

  =over 4

  =item M: 

  With Perl you have to write individual programs for each set of pages.
  With ASP you can write simple code directly within HTML pages.  ASP is
  the Perl killer.

  =item R:

  There are many solutions which allow you to embed Perl in web pages
  just like ASP.  In fact, you can actually use Perl IN ASP pages with
  PerlScript.  Other solutions include: C<Mason>, C<Apache::ASP>,
  C<ePerl>, C<embPerl> and C<XPP>. Also, Microsoft and ActiveState have
  worked very hard to make Perl run equally well on NT as Unix.  You can
  even create COM modules in Perl that can be used from within ASP
  pages.  Some other advantages Perl has over ASP: mod_perl is usually
  much faster then ASP, Perl has much more example code and full
  programs which are freely downloadable, and Perl is cross platform,
  able to run on Solaris, Linux, SCO, Digital Unix, Unix V, AIX, OS2,
  VMS MacOS, Win95-98 and NT to name a few.

  Also, Benchmarks show that embedded Perl solutions outperform ASP/VB
  on IIS by several orders of magnitude. Perl is a much easier language
  for some to learn, especially those with a background in C or C++.

  =back

  =head1 Credits

  Thanks to the mod_perl list for all of the good information and
  criticism.  I'd especially like to thank,

  =over 4

  =item *

  Stas Bekman E<lt>[EMAIL PROTECTED]<gt>

  =item *

  Thornton Prime E<lt>[EMAIL PROTECTED]<gt>

  =item *

  Chip Turner E<lt>[EMAIL PROTECTED]<gt>

  =item *

  Clinton E<lt>[EMAIL PROTECTED]<gt>

  =item *

  Joshua Chamas E<lt>[EMAIL PROTECTED]<gt>

  =item *

  John Edstrom E<lt>[EMAIL PROTECTED]<gt>

  =item *

  Rasmus Lerdorf E<lt>[EMAIL PROTECTED]<gt>

  =item *

  Nedim Cholich E<lt>[EMAIL PROTECTED]<gt>

  =item *

  Mike Perry E<lt> http://www.icorp.net/icorp/feedback.htm E<gt>

  =item *

  Finally, I'd like to thank Robert Santos E<lt>[EMAIL PROTECTED]<gt>,
  CyberNation's lead Business Development guy for inspiring this
  document.

  =back

  =head1 Maintainers

  Maintainer is the person(s) you should contact with updates,
  corrections and patches.

  =over

  =item * 

  Contact the L<mod_perl docs list|maillist::docs-dev>

  =back

  =head1 Authors

  =over

  =item *

  Adam Pisoni E<lt>[EMAIL PROTECTED]<gt>

  =back

  Only the major authors are listed above. For contributors see the
  Changes file.

  =cut

  1.1                  
modperl-docs/src/docs/general/perl_reference/perl_reference.pod

  Index: perl_reference.pod
  ===================================================================
  =head1 NAME

  Perl Reference

  =head1 Description

  This document was born because some users are reluctant to learn Perl,
  prior to jumping into mod_perl.  I will try to cover some of the most
  frequent pure Perl questions being asked at the list.

  Before you decide to skip this chapter make sure you know all the
  information provided here.  The rest of the Guide assumes that you
  have read this chapter and understood it.

  =head1 perldoc's Rarely Known But Very Useful Options

  First of all, I want to stress that you cannot become a Perl hacker
  without knowing how to read Perl documentation and search through it.
  Books are good, but an easily accessible and searchable Perl reference
  at your fingertips is a great time saver. It always has the up-to-date
  information for the version of perl you're using.

  Of course you can use online Perl documentation at the Web. The two
  major sites are http://www.perldoc.com and
  http://theoryx5.uwinnipeg.ca/CPAN/perl/.

  The C<perldoc> utility provides you with access to the documentation
  installed on your system.  To find out what Perl manpages are
  available execute:

    % perldoc perl

  To find what functions perl has, execute:

    % perldoc perlfunc

  To learn the syntax and to find examples of a specific function, you
  would execute (e.g. for C<open()>):

    % perldoc -f open

  Note: In perl5.005_03 and earlier, there is a bug in this and the C<-q> 
  options of C<perldoc>.  It won't call C<pod2man>, but will display the 
  section in POD format instead.  Despite this bug it's still readable 
  and very useful. 

  The Perl FAQ (I<perlfaq> manpage) is in several sections.  To search
  through the sections for C<open> you would execute:

    % perldoc -q open

  This will show you all the matching Question and Answer sections,
  still in POD format.

  To read the I<perldoc> manpage you would execute:

    % perldoc perldoc

  =head1 Tracing Warnings Reports

  Sometimes it's very hard to understand what a warning is complaining
  about.  You see the source code, but you cannot understand why some
  specific snippet produces that warning.  The mystery often results
  from the fact that the code can be called from different places if
  it's located inside a subroutine.

  Here is an example:

    warnings.pl
    -----------
    #!/usr/bin/perl -w

    use strict;

    correct();
    incorrect();

    sub correct{
      print_value("Perl");
    }

    sub incorrect{
      print_value();
    }

    sub print_value{
      my $var = shift;
      print "My value is $var\n";
    }

  In the code above, print_value() prints the passed value.  Subroutine
  correct() passes the value to print, but in subroutine incorrect() we
  forgot to pass it. When we run the script:

    % ./warnings.pl

  we get the warning:

    Use of uninitialized value at ./warnings.pl line 16.

  Perl complains about an undefined variable C<$var> at the line that
  attempts to print its value:

    print "My value is $var\n";

  But how do we know why it is undefined? The reason here obviously is
  that the calling function didn't pass the argument. But how do we know
  who was the caller? In our example there are two possible callers, in
  the general case there can be many of them, perhaps located in other
  files.

  We can use the caller() function, which tells who has called us, but
  even that might not be enough: it's possible to have a longer sequence
  of called subroutines, and not just two. For example, here it is sub
  third() which is at fault, and putting sub caller() in sub second()
  would not help us very much:

    sub third{
      second();
    }
    sub second{
      my $var = shift;
      first($var);
    }
    sub first{
      my $var = shift;
     print "Var = $var\n"
    }

  The solution is quite simple. What we need is a full calls stack trace
  to the call that triggered the warning.

  The C<Carp> module comes to our aid with its cluck() function. Let's
  modify the script by adding a couple of lines.  The rest of the script
  is unchanged.

    warnings2.pl
    -----------
    #!/usr/bin/perl -w

    use strict;
    use Carp ();
    local $SIG{__WARN__} = \&Carp::cluck;

    correct();
    incorrect();

    sub correct{
      print_value("Perl");
    }

    sub incorrect{
      print_value();
    }

    sub print_value{
      my $var = shift;
      print "My value is $var\n";
    }

  Now when we execute it, we see:

    Use of uninitialized value at ./warnings2.pl line 19.
      main::print_value() called at ./warnings2.pl line 14
      main::incorrect() called at ./warnings2.pl line 7

  Take a moment to understand the calls stack trace. The deepest calls
  are printed first. So the second line tells us that the warning was
  triggered in print_value(); the third, that print_value() was
  called by subroutine, incorrect().

    script => incorrect() => print_value()

  We go into C<incorrect()> and indeed see that we forgot to pass the
  variable. Of course when you write a subroutine like C<print_value> it
  would be a good idea to check the passed arguments before starting
  execution. We omitted that step to contrive an easily debugged example.

  Sure, you say, I could find that problem by simple inspection of the
  code! 

  Well, you're right. But I promise you that your task would be quite
  complicated and time consuming if your code has some thousands of
  lines.  In addition, under mod_perl, certain uses of the C<eval>
  operator and "here documents" are known to throw off Perl's line
  numbering, so the messages reporting warnings and errors can have
  incorrect line numbers. (See L<Finding the Line Which Triggered the
  Error or Warning|guide::debug/Finding_the_Line_Which_Triggered> for more
  information).

  Getting the trace helps a lot.

  =head1 Variables Globally, Lexically Scoped And Fully Qualified

  META: this material is new and requires polishing so read with care.

  You will hear a lot about namespaces, symbol tables and lexical
  scoping in Perl discussions, but little of it will make any sense
  without a few key facts:

  =head2 Symbols, Symbol Tables and Packages; Typeglobs

  There are two important types of symbol: package global and lexical.
  We will talk about lexical symbols later, for now we will talk only
  about package global symbols, which we will refer to simply as
  I<global symbols>.

  The names of pieces of your code (subroutine names) and the names of
  your global variables are symbols.  Global symbols reside in one
  symbol table or another.  The code itself and the data do not; the
  symbols are the names of pointers which point (indirectly) to the
  memory areas which contain the code and data. (Note for C/C++
  programmers: we use the term `pointer' in a general sense of one piece
  of data referring to another piece of data not in a specific sense as
  used in C or C++.)

  There is one symbol table for each package, (which is why I<global
  symbols> are really I<package global symbols>).

  You are always working in one package or another.

  Like in C, where the first function you write must be called main(),
  the first statement of your first Perl script is in package C<main::>
  which is the default package.  Unless you say otherwise by using the
  C<package> statement, your symbols are all in package C<main::>. You
  should be aware straight away that files and packages are I<not
  related>. You can have any number of packages in a single file; and a
  single package can be in one file or spread over many files. However
  it is very common to have a single package in a single file. To
  declare a package you write:

      package mypackagename;

  From the following line you are in package C<mypackagename> and any
  symbols you declare reside in that package. When you create a symbol
  (variable, subroutine etc.) Perl uses the name of the package in which
  you are currently working as a prefix to create the fully qualified
  name of the symbol.

  When you create a symbol, Perl creates a symbol table entry for that
  symbol in the current package's symbol table (by default
  C<main::>). Each symbol table entry is called a I<typeglob>. Each
  typeglob can hold information on a scalar, an array, a hash, a
  subroutine (code), a filehandle, a directory handle and a format, each
  of which all have the same name.  So you see now that there are two
  indirections for a global variable: the symbol, (the thing's name),
  points to its typeglob and the typeglob for the thing's type (scalar,
  array, etc.)  points to the data. If we had a scalar and an array with
  the same name their name would point to the same typeglob, but for
  each type of data the typeglob points to somewhere different and so
  the scalar's data and the array's data are completely separate and
  independent, they just happen to have the same name.

  Most of the time, only one part of a typeglob is used (yes, it's a bit
  wasteful).  You will by now know that you distinguish between them by
  using what the authors of the Camel book call a I<funny character>. So
  if we have a scalar called `C<line>' we would refer to it in code as
  C<$line>, and if we had an array of the same name, that would be
  written, C<@line>. Both would point to the same typeglob (which would
  be called C<*line>), but because of the I<funny character> (also known
  as I<decoration>) perl won't confuse the two. Of course we might
  confuse ourselves, so some programmers don't ever use the same name
  for more than one type of variable.

  Every global symbol is in some package's symbol table. To refer to a
  global symbol we could write the I<fully qualified> name,
  e.g. C<$main::line>. If we are in the same package as the symbol we
  can omit the package name, e.g.  C<$line> (unless you use the C<strict>
  pragma and then you will have to predeclare the variable using the
  C<vars> pragma). We can also omit the package name if we have imported
  the symbol into our current package's namespace. If we want to refer
  to a symbol that is in another package and which we haven't imported
  we must use the fully qualified name, e.g. C<$otherpkg::box>.

  Most of the time you do not need to use the fully qualified symbol
  name because most of the time you will refer to package variables from
  within the package.  This is very like C++ class variables.  You can
  work entirely within package C<main::> and never even know you are
  using a package, nor that the symbols have package names.  In a way,
  this is a pity because you may fail to learn about packages and they
  are extremely useful.

  The exception is when you I<import> the variable from another package.
  This creates an alias for the variable in the I<current> package, so  
  that you can access it without using the fully qualified name.

  Whilst global variables are useful for sharing data and are necessary in some
  contexts it is usually wisest to minimize their use and use I<lexical
  variables>, discussed next, instead.

  Note that when you create a variable, the low-level business of
  allocating memory to store the information is handled automatically by
  Perl.  The intepreter keeps track of the chunks of memory to which the
  pointers are pointing and takes care of undefining variables. When all
  references to a variable have ceased to exist then the perl garbage
  collector is free to take back the memory used ready for
  recycling. However perl almost never returns back memory it has
  already used to the operating system during the lifetime of the
  process.

  =head3 Lexical Variables and Symbols

  The symbols for lexical variables (i.e. those declared using the
  keyword C<my>) are the only symbols which do I<not> live in a symbol
  table.  Because of this, they are not available from outside the block
  in which they are declared.  There is no typeglob associated with a
  lexical variable and a lexical variable can refer only to a scalar, an
  array, a hash or a code reference. (Since perl-5.6 it can also refer
  to a file glob).

  If you need access to the data from outside the package then you can
  return it from a subroutine, or you can create a global variable
  (i.e. one which has a package prefix) which points or refers to it and
  return that.  The pointer or reference must be global so that you can
  refer to it by a fully qualified name. But just like in C try to avoid
  having global variables. Using OO methods generally solves this
  problem, by providing methods to get and set the desired value within
  the object that can be lexically scoped inside the package and passed
  by reference.

  The phrase "lexical variable" is a bit of a misnomer, we are really
  talking about "lexical symbols".  The data can be referenced by a
  global symbol too, and in such cases when the lexical symbol goes out
  of scope the data will still be accessible through the global symbol.
  This is perfectly legitimate and cannot be compared to the terrible
  mistake of taking a pointer to an automatic C variable and returning
  it from a function--when the pointer is dereferenced there will be a
  segmentation fault.  (Note for C/C++ programmers: having a function
  return a pointer to an auto variable is a disaster in C or C++; the
  perl equivalent, returning a reference to a lexical variable created
  in a function is normal and useful.)

  =over 

  =item *

  C<my()> vs. C<use vars>:

  With use vars(), you are making an entry in the symbol table, and you
  are telling the compiler that you are going to be referencing that
  entry without an explicit package name.

  With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE.  The compiler figures
  out C<at compile time> which my() variables (i.e. lexical variables)
  are the same as each other, and once you hit execute time you cannot
  go looking those variables up in the symbol table.

  =item *

  C<my()> vs. C<local()>:

  local() creates a temporal-limited package-based scalar, array, hash,
  or glob -- when the scope of definition is exited at runtime, the
  previous value (if any) is restored.  References to such a variable
  are *also* global... only the value changes.  (Aside: that is what
  causes variable suicide. :)

  my() creates a lexically-limited non-package-based scalar, array, or
  hash -- when the scope of definition is exited at compile-time, the
  variable ceases to be accessible.  Any references to such a variable
  at runtime turn into unique anonymous variables on each scope exit.

  =back

  =head2 Additional reading references

  For more information see: L<Using global variables and sharing them
  between

modules/packages|general::perl_reference::perl_reference/Using_Global_Variables_and_Shari>
  and an article by Mark-Jason Dominus about how Perl handles variables
  and namespaces, and the difference between C<use vars()> and C<my()> -
  http://www.plover.com/~mjd/perl/FAQs/Namespaces.html .

  =head1 my() Scoped Variable in Nested Subroutines

  Before we proceed let's make the assumption that we want to develop
  the code under the C<strict> pragma. We will use lexically scoped
  variables (with help of the my() operator) whenever it's possible.

  =head2 The Poison

  Let's look at this code:

    nested.pl
    -----------
    #!/usr/bin/perl

    use strict;

    sub print_power_of_2 {
      my $x = shift;

      sub power_of_2 {
        return $x ** 2; 
      }

      my $result = power_of_2();
      print "$x^2 = $result\n";
    }

    print_power_of_2(5);
    print_power_of_2(6);

  Don't let the weird subroutine names fool you, the print_power_of_2()
  subroutine should print the square of the number passed to it. Let's
  run the code and see whether it works:

    % ./nested.pl

    5^2 = 25
    6^2 = 25

  Ouch, something is wrong. May be there is a bug in Perl and it doesn't
  work correctly with the number 6? Let's try again using 5 and 7:

    print_power_of_2(5);
    print_power_of_2(7);

  And run it:

    % ./nested.pl

    5^2 = 25
    7^2 = 25

  Wow, does it works only for 5? How about using 3 and 5:

    print_power_of_2(3);
    print_power_of_2(5);

  and the result is:

    % ./nested.pl

    3^2 = 9
    5^2 = 9

  Now we start to understand--only the first call to the
  print_power_of_2() function works correctly. Which makes us think that
  our code has some kind of memory for the results of the first
  execution, or it ignores the arguments in subsequent executions.

  =head2 The Diagnosis

  Let's follow the guidelines and use the C<-w> flag. Now execute the
  code:

    % ./nested.pl

    Variable "$x" will not stay shared at ./nested.pl line 9.
    5^2 = 25
    6^2 = 25

  We have never seen such a warning message before and we don't quite
  understand what it means. The C<diagnostics> pragma will certainly
  help us. Let's prepend this pragma before the C<strict> pragma in our
  code:

    #!/usr/bin/perl -w

    use diagnostics;
    use strict;

  And execute it:

    % ./nested.pl

    Variable "$x" will not stay shared at ./nested.pl line 10 (#1)

      (W) An inner (nested) named subroutine is referencing a lexical
      variable defined in an outer subroutine.

      When the inner subroutine is called, it will probably see the value of
      the outer subroutine's variable as it was before and during the
      *first* call to the outer subroutine; in this case, after the first
      call to the outer subroutine is complete, the inner and outer
      subroutines will no longer share a common value for the variable.  In
      other words, the variable will no longer be shared.

      Furthermore, if the outer subroutine is anonymous and references a
      lexical variable outside itself, then the outer and inner subroutines
      will never share the given variable.

      This problem can usually be solved by making the inner subroutine
      anonymous, using the sub {} syntax.  When inner anonymous subs that
      reference variables in outer subroutines are called or referenced,
      they are automatically rebound to the current values of such
      variables.

    5^2 = 25
    6^2 = 25

  Well, now everything is clear. We have the B<inner> subroutine
  power_of_2() and the B<outer> subroutine print_power_of_2() in our
  code.

  When the inner power_of_2() subroutine is called for the first time,
  it sees the value of the outer print_power_of_2() subroutine's C<$x>
  variable. On subsequent calls the inner subroutine's C<$x> variable
  won't be updated, no matter what new values are given to C<$x> in the
  outer subroutine.  There are two copies of the C<$x> variable, no
  longer a single one shared by the two routines.

  =head2 The Remedy

  The C<diagnostics> pragma suggests that the problem can be solved by
  making the inner subroutine anonymous.

  An anonymous subroutine can act as a I<closure> with respect to
  lexically scoped variables. Basically this means that if you define a
  subroutine in a particular B<lexical> context at a particular moment,
  then it will run in that same context later, even if called from
  outside that context.  The upshot of this is that when the subroutine
  B<runs>, you get the same copies of the lexically scoped variables
  which were visible when the subroutine was B<defined>.  So you can
  pass arguments to a function when you define it, as well as when you
  invoke it.

  Let's rewrite the code to use this technique:

    anonymous.pl
    --------------
    #!/usr/bin/perl

    use strict;

    sub print_power_of_2 {
      my $x = shift;

      my $func_ref = sub {
        return $x ** 2;
      };

      my $result = &$func_ref();
      print "$x^2 = $result\n";
    }

    print_power_of_2(5);
    print_power_of_2(6);

  Now C<$func_ref> contains a reference to an anonymous subroutine,
  which we later use when we need to get the power of two.  Since it is
  anonymous, the subroutine will automatically be rebound to the new
  value of the outer scoped variable C<$x>, and the results will now be
  as expected.

  Let's verify:

    % ./anonymous.pl

    5^2 = 25
    6^2 = 36

  So we can see that the problem is solved. 

  =head1 Understanding Closures -- the Easy Way

  In Perl, a closure is just a subroutine that refers to one or more
  lexical variables declared outside the subroutine itself and must
  therefore create a distinct clone of the environment on the way out.

  And both named subroutines and anonymous subroutines can be closures.

  Here's how to tell if a subroutine is a closure or not:

    for (1..5) {
      push @a, sub { "hi there" };
    }
    for (1..5) {
      {
        my $b;
        push @b, sub { $b."hi there" };
      }
    }
    print "anon normal:\n", join "\t\n",@a,"\n";
    print "anon closure:\n",join "\t\n",@b,"\n";

  which generates:

    anon normal:
    CODE(0x80568e4)   
    CODE(0x80568e4)   
    CODE(0x80568e4)   
    CODE(0x80568e4)   
    CODE(0x80568e4)   

    anon closure:
    CODE(0x804b4c0)   
    CODE(0x8056b54)   
    CODE(0x8056bb4)   
    CODE(0x80594d8)   
    CODE(0x8059538)   

  Note how each code reference from the non-closure is identical, but
  the closure form must generate distinct coderefs to point at the
  distinct instances of the closure.

  And now the same with named subroutines:

    for (1..5) {
      sub a { "hi there" };
      push @a, \&a;
    }
    for (1..5) {
      {
        my $b;
        sub b { $b."hi there" };
        push @b, \&b;
      }
    }
    print "normal:\n", join "\t\n",@a,"\n";
    print "closure:\n",join "\t\n",@b,"\n";

  which generates:

    anon normal:
    CODE(0x80568c0)   
    CODE(0x80568c0)   
    CODE(0x80568c0)   
    CODE(0x80568c0)   
    CODE(0x80568c0)   

    anon closure:
    CODE(0x8056998)   
    CODE(0x8056998)   
    CODE(0x8056998)   
    CODE(0x8056998)   
    CODE(0x8056998)   

  We can see that both versions has generated the same code
  reference. For the subroutine I<a> it's easy, since it doesn't include
  any lexical variables defined outside it in the same lexical scope. 

  As for the subroutine I<b>, it's indeed a closure, but Perl won't
  recompile it since it's a named subroutine (see the I<perlsub>
  manpage). It's something that we don't want to happen in our code
  unless we want it for this special effect, similar to I<static>
  variables in C.

  This is the underpinnings of that famous I<"won't stay shared">
  message.  A I<my> variable in a named subroutine context is generating
  identical code references and therefore it ignores any future changes
  to the lexical variables outside of it.

  =head2 Mike Guy's Explanation of the Inner Subroutine Behavior

    From: [EMAIL PROTECTED] (M.J.T. Guy)
    Newsgroups: comp.lang.perl.misc
    Subject: Re: Lexical scope and embedded subroutines.
    Date: 6 Jan 1998 18:22:39 GMT
    Message-ID: <[EMAIL PROTECTED]>

    In article <[EMAIL PROTECTED]>, Aaron Harsh <[EMAIL PROTECTED]>
    wrote:

    > Before I read this thread (and perlsub to get the details) I would
    > have assumed the original code was fine.
    >
    > This behavior brings up the following questions:
    >  o Is Perl's behavior some sort of speed optimization?

    No, but see below.

    >  o Did the Perl gods just decide that scheme-like behavior was less
    > important than the pseduo-static variables described in perlsub?

    This subject has been kicked about at some length on perl5-porters.
    The current behaviour was chosen as the best of a bad job.  In the
    context of Perl, it's not obvious what "scheme-like behavior" means.
    So it isn't an option.  See below for details.

    >  o Does anyone else find Perl's behavior counter-intuitive?

    *Everyone* finds it counterintuitive.  The fact that it only generates
    a warning rather than a hard error is part of the Perl Gods policy of
    hurling thunderbolts at those so irreverent as not to use -w.

    >  o Did programming in scheme destroy my ability to judge a decent
    >    language
    > feature?

    You're still interested in Perl, so it can't have rotted your brain
    completely.

    >  o Have I misremembered how scheme handles these situations?

    Probably not.

    >  o Do Perl programmers really care how much Perl acts like scheme?

    Some do.

    >  o Should I have stopped this message two or three questions ago?

    Yes.

    The problem to be solved can be stated as

      "When a subroutine refers to a variable which is instantiated more
      than once (i.e. the variable is declared in a for loop, or in a
      subroutine), which instance of that variable should be used?"

    The basic problem is that Perl isn't Scheme (or Pascal or any of the
    other comparators that have been used).

    In almost all lexically scoped languages (i.e. those in the Algol60
    tradition), named subroutines are also lexically scoped.  So the scope
    of the subroutine is necessarily contained in the scope of any
    external variable referred to inside the subroutine.  So there's an
    obvious answer to the "which instance?" problem.

    But in Perl, named subroutines are globally scoped.  (But in some
    future Perl, you'll be able to write

      my sub lex { ... }

    to get lexical scoping.)  So the solution adopted by other languages
    can't be used.

    The next suggestion most people come up with is "Why not use the most
    recently instantiated variable?".  This Does The Right Thing in many
    cases, but fails when recursion or other complications are involved.

    Consider:

      sub outer {
          inner();
          outer();
          my $trouble;
          inner();
          sub inner { $trouble };
          outer();
          inner();
      }

    Which instance of $trouble is to be used for each call of inner()?
    And why?

    The consensus was that an incomplete solution was unacceptable, so the
    simple rule "Use the first instance" was adopted instead.

    And it is more efficient than possible alternative rules.  But that's
    not why it was done.

    Mike Guy

  =head1 When You Cannot Get Rid of The Inner Subroutine

  First you might wonder, why in the world will someone need to define
  an inner subroutine? Well, for example to reduce some of Perl's script
  startup overhead you might decide to write a daemon that will compile
  the scripts and modules only once, and cache the pre-compiled code in
  memory. When some script is to be executed, you just tell the daemon
  the name of the script to run and it will do the rest and do it much
  faster since compilation has already taken place.

  Seems like an easy task, and it is. The only problem is once the
  script is compiled, how do you execute it? Or let's put it the other
  way: after it was executed for the first time and it stays compiled in
  the daemon's memory, how do you call it again? If you could get all
  developers to code their scripts so each has a subroutine called run()
  that will actually execute the code in the script then we've solved
  half the problem.

  But how does the daemon know to refer to some specific script if they
  all run in the C<main::> name space? One solution might be to ask the
  developers to declare a package in each and every script, and for the
  package name to be derived from the script name. However, since there
  is a chance that there will be more than one script with the same name
  but residing in different directories, then in order to prevent
  namespace collisions the directory has to be a part of the package
  name too. And don't forget that the script may be moved from one
  directory to another, so you will have to make sure that the package
  name is corrected every time the script gets moved.

  But why enforce these strange rules on developers, when we can arrange
  for our daemon to do this work? For every script that the daemon is
  about to execute for the first time, the script should be wrapped
  inside the package whose name is constructed from the mangled path to
  the script and a subroutine called run(). For example if the daemon is
  about to execute the script I</tmp/hello.pl>:

    hello.pl
    --------
    #!/usr/bin/perl
    print "Hello\n";

  Prior to running it, the daemon will change the code to be:

    wrapped_hello.pl
    ----------------
    package cache::tmp::hello_2epl;

    sub run{
      #!/usr/bin/perl 
      print "Hello\n";
    }

  The package name is constructed from the prefix C<cache::>, each
  directory separation slash is replaced with C<::>, and non
  alphanumeric characters are encoded so that for example C<.> (a dot)
  becomes C<_2e> (an underscore followed by the ASCII code for a dot in
  hex representation).

   % perl -e 'printf "%x",ord(".")'

  prints: C<2e>. The underscore is the same you see in URL encoding
  except the C<%> character is used instead (C<%2E>), but since C<%> has
  a special meaning in Perl (prefix of hash variable) it couldn't be
  used.

  Now when the daemon is requested to execute the script
  I</tmp/hello.pl>, all it has to do is to build the package name as
  before based on the location of the script and call its run()
  subroutine:

    use cache::tmp::hello_2epl;
    cache::tmp::hello_2epl::run();

  We have just written a partial prototype of the daemon we wanted. The
  only outstanding problem is how to pass the path to the script to the
  daemon. This detail is left as an exercise for the reader.

  If you are familiar with the C<Apache::Registry> module, you know that
  it works in almost the same way. It uses a different package prefix
  and the generic function is called handler() and not run(). The
  scripts to run are passed through the HTTP protocol's headers.

  Now you understand that there are cases where your normal subroutines
  can become inner, since if your script was a simple:

    simple.pl
    ---------
    #!/usr/bin/perl 
    sub hello { print "Hello" }
    hello();

  Wrapped into a run() subroutine it becomes:

    simple.pl
    ---------
    package cache::simple_2epl;

    sub run{
      #!/usr/bin/perl 
      sub hello { print "Hello" }
      hello();
    }

  Therefore, hello() is an inner subroutine and if you have used my()
  scoped variables defined and altered outside and used inside hello(),
  it won't work as you expect starting from the second call, as was
  explained in the previous section.

  =head2 Remedies for Inner Subroutines

  First of all there is nothing to worry about, as long as you don't
  forget to turn the warnings On.  If you do happen to have the 
  "L<my() Scoped Variable in Nested 

Subroutines|general::perl_reference::perl_reference/my_Scoped_Variable_in_Nested_S>"
  problem, Perl will always alert you.

  Given that you have a script that has this problem, what are the ways
  to solve it? There are many of them and we will discuss some of them
  here.

  We will use the following code to show the different solutions.

    multirun.pl
    -----------
    #!/usr/bin/perl -w

    use strict;

    for (1..3){
      print "run: [time $_]\n";
      run();
    }

    sub run{

      my $counter = 0;

      increment_counter();
      increment_counter();

      sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\n";
      }

    } # end of sub run

  This code executes the run() subroutine three times, which in turn
  initializes the C<$counter> variable to 0, every time it is executed
  and then calls the inner subroutine increment_counter() twice. Sub
  increment_counter() prints C<$counter>'s value after incrementing
  it. One might expect to see the following output:

    run: [time 1]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 2]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 3]
    Counter is equal to 1 !
    Counter is equal to 2 !

  But as we have already learned from the previous sections, this is not
  what we are going to see. Indeed, when we run the script we see:

    % ./multirun.pl

    Variable "$counter" will not stay shared at ./nested.pl line 18.
    run: [time 1]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 2]
    Counter is equal to 3 !
    Counter is equal to 4 !
    run: [time 3]
    Counter is equal to 5 !
    Counter is equal to 6 !

  Obviously, the C<$counter> variable is not reinitialized on each
  execution of run(). It retains its value from the previous execution,
  and sub increment_counter() increments that.

  One of the workarounds is to use globally declared variables, with the
  C<vars> pragma.

    multirun1.pl
    -----------
    #!/usr/bin/perl -w

    use strict;
    use vars qw($counter);

    for (1..3){
      print "run: [time $_]\n";
      run();
    }

    sub run {

      $counter = 0;

      increment_counter();
      increment_counter();

      sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\n";
      }

    } # end of sub run

  If you run this and the other solutions offered below, the expected
  output will be generated:

    % ./multirun1.pl

    run: [time 1]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 2]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 3]
    Counter is equal to 1 !
    Counter is equal to 2 !

  By the way, the warning we saw before has gone, and so has the
  problem, since there is no C<my()> (lexically defined) variable used
  in the nested subroutine.

  Another approach is to use fully qualified variables. This is better,
  since less memory will be used, but it adds a typing overhead:

    multirun2.pl
    -----------
    #!/usr/bin/perl -w

    use strict;

    for (1..3){
      print "run: [time $_]\n";
      run();
    }

    sub run {

      $main::counter = 0;

      increment_counter();
      increment_counter();

      sub increment_counter{
        $main::counter++;
        print "Counter is equal to $main::counter !\n";
      }

    } # end of sub run

  You can also pass the variable to the subroutine by value and make the
  subroutine return it after it was updated. This adds time and memory
  overheads, so it may not be good idea if the variable can be very
  large, or if speed of execution is an issue.

  Don't rely on the fact that the variable is small during the
  development of the application, it can grow quite big in situations
  you don't expect. For example, a very simple HTML form text entry
  field can return a few megabytes of data if one of your users is bored
  and wants to test how good your code is. It's not uncommon to see
  users copy-and-paste 10Mb core dump files into a form's text fields
  and then submit it for your script to process.

    multirun3.pl
    -----------
    #!/usr/bin/perl -w

    use strict;

    for (1..3){
      print "run: [time $_]\n";
      run();
    }

    sub run {

      my $counter = 0;

      $counter = increment_counter($counter);
      $counter = increment_counter($counter);

      sub increment_counter{
        my $counter = shift;

        $counter++;
        print "Counter is equal to $counter !\n";

        return $counter;
      }

    } # end of sub run

  Finally, you can use references to do the job. The version of
  increment_counter() below accepts a reference to the C<$counter>
  variable and increments its value after first dereferencing it. When
  you use a reference, the variable you use inside the function is
  physically the same bit of memory as the one outside the function.
  This technique is often used to enable a called function to modify
  variables in a calling function.

    multirun4.pl
    -----------
    #!/usr/bin/perl -w

    use strict;

    for (1..3){
      print "run: [time $_]\n";
      run();
    }

    sub run {

      my $counter = 0;

      increment_counter(\$counter);
      increment_counter(\$counter);

      sub increment_counter{
        my $r_counter = shift;

        $$r_counter++;
        print "Counter is equal to $$r_counter !\n";
      }

    } # end of sub run

  Here is yet another and more obscure reference usage. We modify the
  value of C<$counter> inside the subroutine by using the fact that
  variables in C<@_> are aliases for the actual scalar parameters. Thus
  if you called a function with two arguments, those would be stored in
  C<$_[0]> and C<$_[1]>. In particular, if an element C<$_[0]> is
  updated, the corresponding argument is updated (or an error occurs if
  it is not updatable as would be the case of calling the function with
  a literal, e.g. I<increment_counter(5)>).

    multirun5.pl
    -----------
    #!/usr/bin/perl -w

    use strict;

    for (1..3){
      print "run: [time $_]\n";
      run();
    }

    sub run {

      my $counter = 0;

      increment_counter($counter);
      increment_counter($counter);

      sub increment_counter{
        $_[0]++;
        print "Counter is equal to $_[0] !\n";
      }

    } # end of sub run

  The approach given above should be properly documented of course.

  Here is a solution that avoids the problem entirely by splitting the
  code into two files; the first is really just a wrapper and loader,
  the second file contains the heart of the code.

    multirun6.pl
    -----------
    #!/usr/bin/perl -w

    use strict;
    require 'multirun6-lib.pl' ;

    for (1..3){
      print "run: [time $_]\n";
      run();
    }

  Separate file:

    multirun6-lib.pl
    ----------------
    use strict ;

    my $counter;

    sub run {
      $counter = 0;

      increment_counter();
      increment_counter();
    }

    sub increment_counter{
      $counter++;
      print "Counter is equal to $counter !\n";
    }

    1 ;

  Now you have at least six workarounds to choose from.

  For more information please refer to perlref and perlsub manpages.

  =head1 use(), require(), do(), %INC and @INC Explained

  =head2 The @INC array

  C<@INC> is a special Perl variable which is the equivalent of the
  shell's C<PATH> variable. Whereas C<PATH> contains a list of
  directories to search for executables, C<@INC> contains a list of
  directories from which Perl modules and libraries can be loaded.

  When you use(), require() or do() a filename or a module, Perl gets a
  list of directories from the C<@INC> variable and searches them for
  the file it was requested to load.  If the file that you want to load
  is not located in one of the listed directories, you have to tell Perl
  where to find the file.  You can either provide a path relative to one
  of the directories in C<@INC>, or you can provide the full path to the
  file.

  =head2 The %INC hash

  C<%INC> is another special Perl variable that is used to cache the
  names of the files and the modules that were successfully loaded and
  compiled by use(), require() or do() statements. Before attempting to
  load a file or a module with use() or require(), Perl checks whether
  it's already in the C<%INC> hash. If it's there, the loading and
  therefore the compilation are not performed at all. Otherwise the file
  is loaded into memory and an attempt is made to compile it. do() does
  unconditional loading--no lookup in the C<%INC> hash is made.

  If the file is successfully loaded and compiled, a new key-value pair
  is added to C<%INC>. The key is the name of the file or module as it
  was passed to the one of the three functions we have just mentioned,
  and if it was found in any of the C<@INC> directories except C<".">
  the value is the full path to it in the file system.

  The following examples will make it easier to understand the logic.

  First, let's see what are the contents of C<@INC> on my system:

    % perl -e 'print join "\n", @INC'
    /usr/lib/perl5/5.00503/i386-linux
    /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005
    .

  Notice the C<.> (current directory) is the last directory in the list.

  Now let's load the module C<strict.pm> and see the contents of C<%INC>:

    % perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'

    strict.pm => /usr/lib/perl5/5.00503/strict.pm

  Since C<strict.pm> was found in I</usr/lib/perl5/5.00503/> directory
  and I</usr/lib/perl5/5.00503/> is a part of C<@INC>, C<%INC> includes
  the full path as the value for the key C<strict.pm>.

  Now let's create the simplest module in C</tmp/test.pm>:

    test.pm
    -------
    1;

  It does nothing, but returns a true value when loaded. Now let's load
  it in different ways:

    % cd /tmp
    % perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'

    test.pm => test.pm

  Since the file was found relative to C<.> (the current directory), the
  relative path is inserted as the value. If we alter C<@INC>, by adding
  I</tmp> to the end:

    % cd /tmp
    % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
    print map {"$_ => $INC{$_}\n"} keys %INC'

    test.pm => test.pm

  Here we still get the relative path, since the module was found first
  relative to C<".">. The directory I</tmp> was placed after C<.> in the
  list. If we execute the same code from a different directory, the
  C<"."> directory won't match,

    % cd /
    % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
    print map {"$_ => $INC{$_}\n"} keys %INC'

    test.pm => /tmp/test.pm

  so we get the full path. We can also prepend the path with unshift(),
  so it will be used for matching before C<"."> and therefore we will
  get the full path as well:

    % cd /tmp
    % perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
    print map {"$_ => $INC{$_}\n"} keys %INC'

    test.pm => /tmp/test.pm

  The code:

    BEGIN{unshift @INC, "/tmp"}

  can be replaced with the more elegant:

    use lib "/tmp";

  Which is almost equivalent to our C<BEGIN> block and is the
  recommended approach.

  These approaches to modifying C<@INC> can be labor intensive, since
  if you want to move the script around in the file-system you have to
  modify the path. This can be painful, for example, when you move your
  scripts from development to a production server.

  There is a module called C<FindBin> which solves this problem in the
  plain Perl world, but unfortunately it won't work under mod_perl,
  since it's a module and as any module it's loaded only once. So the
  first script using it will have all the settings correct, but the rest
  of the scripts will not if located in a different directory from the
  first.

  For the sake of completeness, I'll present this module anyway.

  If you use this module, you don't need to write a hard coded path. The
  following snippet does all the work for you (the file is
  I</tmp/load.pl>):

    load.pl
    -------
    #!/usr/bin/perl

    use FindBin ();
    use lib "$FindBin::Bin";
    use test;
    print "test.pm => $INC{'test.pm'}\n";

  In the above example C<$FindBin::Bin> is equal to I</tmp>. If we move
  the script somewhere else... e.g. I</tmp/new_dir> in the code above
  C<$FindBin::Bin> equals I</tmp/new_dir>.

    % /tmp/load.pl

    test.pm => /tmp/test.pm

  This is just like C<use lib> except that no hard coded path is
  required.

  You can use this workaround to make it work under mod_perl.

    do 'FindBin.pm';
    unshift @INC, "$FindBin::Bin";
    require test;
    #maybe test::import( ... ) here if need to import stuff

  This has a slight overhead because it will load from disk and
  recompile the C<FindBin> module on each request. So it may not be
  worth it.

  =head2 Modules, Libraries and Program Files

  Before we proceed, let's define what we mean by I<module>, 
  I<library> and I<program file>.

  =over

  =item * Libraries

  These are files which contain Perl subroutines and other code.

  When these are used to break up a large program into manageable chunks
  they don't generally include a package declaration; when they are used
  as subroutine libraries they often do have a package declaration.

  Their last statement returns true, a simple C<1;> statement ensures
  that.

  They can be named in any way desired, but generally their extension is
  I<.pl>.

  Examples:

    config.pl
    ----------
    # No package so defaults to main::
    $dir = "/home/httpd/cgi-bin";
    $cgi = "/cgi-bin";
    1;

    mysubs.pl
    ----------
    # No package so defaults to main::
    sub print_header{
      print "Content-type: text/plain\r\n\r\n";
    }
    1;

    web.pl
    ------------
    package web ;
    # Call like this: web::print_with_class('loud',"Don't shout!");
    sub print_with_class{
      my( $class, $text ) = @_ ;
      print qq{<span class="$class">$text</span>};
    }
    1;

  =item * Modules

  A file which contains perl subroutines and other code.

  It generally declares a package name at the beginning of it.

  Modules are generally used either as function libraries (which I<.pl>
  files are still but less commonly used for), or as object libraries
  where a module is used to define a class and its methods.

  Its last statement returns true.

  The naming convention requires it to have a I<.pm> extension.

  Example:

    MyModule.pm
    -----------
    package My::Module;
    $My::Module::VERSION = 0.01;

    sub new{ return bless {}, shift;}
    END { print "Quitting\n"}
    1;

  =item * Program Files

  Many Perl programs exist as a single file. Under Linux and other
  Unix-like operating systems the file often has no suffix since the
  operating system can determine that it is a perl script from the first
  line (shebang line) or if it's Apache that executes the code, there is
  a variety of ways to tell how and when the file should be executed.
  Under Windows a suffix is normally used, for example C<.pl> or
  C<.plx>.

  The program file will normally C<require()> any libraries and C<use()>
  any modules it requires for execution.

  It will contain Perl code but won't usually have any package names.

  Its last statement may return anything or nothing.

  =back

  =head2 require()

  require() reads a file containing Perl code and compiles it. Before
  attempting to load the file it looks up the argument in C<%INC> to see
  whether it has already been loaded. If it has, require() just returns
  without doing a thing. Otherwise an attempt will be made to load and
  compile the file.

  require() has to find the file it has to load. If the argument is a
  full path to the file, it just tries to read it. For example:

    require "/home/httpd/perl/mylibs.pl";

  If the path is relative, require() will attempt to search for the file
  in all the directories listed in C<@INC>.  For example:

    require "mylibs.pl";

  If there is more than one occurrence of the file with the same name in
  the directories listed in C<@INC> the first occurrence will be used.

  The file must return I<TRUE> as the last statement to indicate
  successful execution of any initialization code. Since you never know
  what changes the file will go through in the future, you cannot be
  sure that the last statement will always return I<TRUE>. That's why
  the suggestion is to put "C<1;>" at the end of file.

  Although you should use the real filename for most files, if the file
  is a 
L<module|general::perl_reference::perl_reference/Modules__Libraries_and_Program_Files>,
 you may use the
  following convention instead:

    require My::Module;

  This is equal to:

    require "My/Module.pm";

  If require() fails to load the file, either because it couldn't find
  the file in question or the code failed to compile, or it didn't
  return I<TRUE>, then the program would die().  To prevent this the
  require() statement can be enclosed into an eval() exception-handling
  block, as in this example:

    require.pl
    ----------
    #!/usr/bin/perl -w

    eval { require "/file/that/does/not/exists"};
    if ($@) {
      print "Failed to load, because : $@"
    }
    print "\nHello\n";

  When we execute the program:

    % ./require.pl

    Failed to load, because : Can't locate /file/that/does/not/exists in
    @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux
    /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005 .) at require.pl line 3.

    Hello

  We see that the program didn't die(), because I<Hello> was
  printed. This I<trick> is useful when you want to check whether a user
  has some module installed, but if she hasn't it's not critical,
  perhaps the program can run without this module with reduced
  functionality.

  If we remove the eval() part and try again:

    require.pl
    ----------
    #!/usr/bin/perl -w

    require "/file/that/does/not/exists";
    print "\nHello\n";

    % ./require1.pl

    Can't locate /file/that/does/not/exists in @INC (@INC contains:
    /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.

  The program just die()s in the last example, which is what you want in
  most cases.

  For more information refer to the perlfunc manpage.

  =head2 use()

  use(), just like require(), loads and compiles files containing Perl
  code, but it works with

L<modules|general::perl_reference::perl_reference/Modules__Libraries_and_Program_Files>
 only and
  is executed at compile time.

  The only way to pass a module to load is by its module name and not
  its filename.  If the module is located in I<MyCode.pm>, the correct
  way to use() it is:

    use MyCode

  and not:

    use "MyCode.pm"

  use() translates the passed argument into a file name replacing C<::>
  with the operating system's path separator (normally C</>) and
  appending I<.pm> at the end. So C<My::Module> becomes I<My/Module.pm>.

  use() is exactly equivalent to:

   BEGIN { require Module; Module->import(LIST); }

  Internally it calls require() to do the loading and compilation
  chores. When require() finishes its job, import() is called unless
  C<()> is the second argument. The following pairs are equivalent:

    use MyModule;
    BEGIN {require MyModule; MyModule->import; }

    use MyModule qw(foo bar);
    BEGIN {require MyModule; MyModule->import("foo","bar"); }

    use MyModule ();
    BEGIN {require MyModule; }

  The first pair exports the default tags. This happens if the module
  sets C<@EXPORT> to a list of tags to be exported by default. The
  module's manpage normally describes what tags are exported by
  default.

  The second pair exports only the tags passed as arguments. 

  The third pair describes the case where the caller does not want any
  symbols to be imported.

  C<import()> is not a builtin function, it's just an ordinary static
  method call into the "C<MyModule>" package to tell the module to
  import the list of features back into the current package. See the
  Exporter manpage for more information.

  When you write your own modules, always remember that it's better to
  use C<@EXPORT_OK> instead of C<@EXPORT>, since the former doesn't
  export symbols unless it was asked to. Exports pollute the namespace
  of the module user. Also avoid short or common symbol names to reduce
  the risk of name clashes.

  When functions and variables aren't exported you can still access them
  using their full names, like C<$My::Module::bar> or
  C<$My::Module::foo()>.  By convention you can use a leading underscore
  on names to informally indicate that they are I<internal> and not for
  public use.

  There's a corresponding "C<no>" command that un-imports symbols
  imported by C<use>, i.e., it calls C<Module-E<gt>unimport(LIST)>
  instead of C<import()>.

  =head2 do()

  While do() behaves almost identically to require(), it reloads the
  file unconditionally. It doesn't check C<%INC> to see whether the file
  was already loaded.

  If do() cannot read the file, it returns C<undef> and sets C<$!> to
  report the error.  If do() can read the file but cannot compile it, it
  returns C<undef> and puts an error message in C<$@>. If the file is
  successfully compiled, do() returns the value of the last expression
  evaluated.

  =head1 Using Global Variables and Sharing Them Between Modules/Packages

  It helps when you code your application in a structured way, using the
  perl packages, but as you probably know once you start using packages
  it's much harder to share the variables between the various
  packagings. A configuration package comes to mind as a good example of
  the package that will want its variables to be accessible from the
  other modules.

  Of course using the Object Oriented (OO) programming is the best way
  to provide an access to variables through the access methods. But if
  you are not yet ready for OO techniques you can still benefit from
  using the techniques we are going to talk about.

  =head2 Making Variables Global

  When you first wrote C<$x> in your code you created a (package) global
  variable.  It is visible everywhere in your program, although if used
  in a package other than the package in which it was declared
  (C<main::> by default), it must be referred to with its fully
  qualified name, unless you have imported this variable with
  import(). This will work only if you do not use C<strict> pragma; but
  you I<have> to use this pragma if you want to run your scripts under
  mod_perl. Read L<The strict
  pragma|guide::porting/The_strict_pragma> to find out why. 

  =head2 Making Variables Global With strict Pragma On

  First you use :

    use strict;

  Then you use:

   use vars qw($scalar %hash @array);

  This declares the named variables as package globals in the current
  package.  They may be referred to within the same file and package
  with their unqualified names; and in different files/packages with
  their fully qualified names. 

  With perl5.6 you can use the C<our> operator instead:

    our($scalar, %hash, @array);

  If you want to share package global variables between packages, here
  is what you can do.

  =head2 Using Exporter.pm to Share Global Variables

  Assume that you want to share the C<CGI.pm> object (I will use C<$q>)
  between your modules. For example, you create it in C<script.pl>, but
  you want it to be visible in C<My::HTML>. First, you make C<$q>
  global.

    script.pl:
    ----------------
    use vars qw($q);
    use CGI;
    use lib qw(.); 
    use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
    $q = CGI->new;

    My::HTML::printmyheader();

  Note that we have imported C<$q> from C<My::HTML>. And C<My::HTML>
  does the export of C<$q>:

    My/HTML.pm
    ----------------
    package My::HTML;
    use strict;

    BEGIN {
      use Exporter ();

      @My::HTML::ISA         = qw(Exporter);
      @My::HTML::EXPORT      = qw();
      @My::HTML::EXPORT_OK   = qw($q);

    }

    use vars qw($q);

    sub printmyheader{
      # Whatever you want to do with $q... e.g.
      print $q->header();
    }
    1;

  So the C<$q> is shared between the C<My::HTML> package and
  C<script.pl>. It will work vice versa as well, if you create the
  object in C<My::HTML> but use it in C<script.pl>. You have true
  sharing, since if you change C<$q> in C<script.pl>, it will be changed
  in C<My::HTML> as well.

  What if you need to share C<$q> between more than two packages? For
  example you want My::Doc to share C<$q> as well.

  You leave C<My::HTML> untouched, and modify I<script.pl> to include:

   use My::Doc qw($q);

  Then you add the same C<Exporter> code that we used in C<My::HTML>,
  into C<My::Doc>, so that it also exports C<$q>.

  One possible pitfall is when you want to use C<My::Doc> in both
  C<My::HTML> and I<script.pl>. Only if you add

    use My::Doc qw($q);

  into C<My::HTML> will C<$q> be shared. Otherwise C<My::Doc> will not
  share C<$q> any more. To make things clear here is the code:

    script.pl:
    ----------------
    use vars qw($q);
    use CGI;
    use lib qw(.); 
    use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
    use My::Doc  qw($q); # Ditto
    $q = new CGI;

    My::HTML::printmyheader();

    My/HTML.pm
    ----------------
    package My::HTML;
    use strict;

    BEGIN {
      use Exporter ();

      @My::HTML::ISA         = qw(Exporter);
      @My::HTML::EXPORT      = qw();
      @My::HTML::EXPORT_OK   = qw($q);

    }

    use vars     qw($q);
    use My::Doc  qw($q);

    sub printmyheader{
      # Whatever you want to do with $q... e.g.
      print $q->header();

      My::Doc::printtitle('Guide');
    }
    1;

    My/Doc.pm
    ----------------
    package My::Doc;
    use strict;

    BEGIN {
      use Exporter ();

      @My::Doc::ISA         = qw(Exporter);
      @My::Doc::EXPORT      = qw();
      @My::Doc::EXPORT_OK   = qw($q);

    }

    use vars qw($q);

    sub printtitle{
      my $title = shift || 'None';

      print $q->h1($title);
    }
    1;

  =head2 Using the Perl Aliasing Feature to Share Global Variables

  As the title says you can import a variable into a script or module
  without using C<Exporter.pm>. I have found it useful to keep all the
  configuration variables in one module C<My::Config>. But then I have
  to export all the variables in order to use them in other modules,
  which is bad for two reasons: polluting other packages' name spaces
  with extra tags which increases the memory requirements; and adding
  the overhead of keeping track of what variables should be exported
  from the configuration module and what imported, for some particular
  package.  I solve this problem by keeping all the variables in one
  hash C<%c> and exporting that. Here is an example of C<My::Config>:

    package My::Config;
    use strict;
    use vars qw(%c);
    %c = (
      # All the configs go here
      scalar_var => 5,

      array_var  => [qw(foo bar)],

      hash_var   => {
                     foo => 'Foo',
                     bar => 'BARRR',
                    },
    );
    1;

  Now in packages that want to use the configuration variables I have
  either to use the fully qualified names like C<$My::Config::test>,
  which I dislike or import them as described in the previous section.
  But hey, since we have only one variable to handle, we can make things
  even simpler and save the loading of the C<Exporter.pm> package. We
  will use the Perl aliasing feature for exporting and saving the
  keystrokes:

    package My::HTML;
    use strict;
    use lib qw(.);
      # Global Configuration now aliased to global %c
    use My::Config (); # My/Config.pm in the same dir as script.pl
    use vars qw(%c);
    *c = \%My::Config::c;

      # Now you can access the variables from the My::Config
    print $c{scalar_var};
    print $c{array_var}[0];
    print $c{hash_var}{foo};

  Of course $c is global everywhere you use it as described above, and
  if you change it somewhere it will affect any other packages you have
  aliased C<$My::Config::c> to.

  Note that aliases work either with global or C<local()> vars - you
  cannot write:

    my *c = \%My::Config::c; # ERROR!

  Which is an error. But you can write:

    local *c = \%My::Config::c;

  For more information about aliasing, refer to the Camel book, second
  edition, pages 51-52.

  =head2 Using Non-Hardcoded Configuration Module Names

  You have just seen how to use a configuration module for configuration
  centralization and an easy access to the information stored in this
  module. However, there is somewhat of a chicken-and-egg problem--how
  to let your other modules know the name of this file? Hardcoding the
  name is brittle--if you have only a single project it should be fine,
  but if you have more projects which use different configurations and
  you will want to reuse their code you will have to find all instances
  of the hardcoded name and replace it. 

  Another solution could be to have the same name for a configuration
  module, like C<My::Config> but putting a different copy of it into
  different locations. But this won't work under mod_perl because of the
  namespace collision. You cannot load different modules which uses the
  same name, only the first one will be loaded.

  Luckily, there is another solution which allows us to stay flexible.
  C<PerlSetVar> comes to rescue. Just like with environment variables,
  you can set server's global Perl variables which can be retrieved from
  any module and script. Those statements are placed into the
  I<httpd.conf> file. For example

    PerlSetVar FooBaseDir       /home/httpd/foo
    PerlSetVar FooConfigModule  Foo::Config

  Now we require() the file where the above configuration will be used.

    PerlRequire /home/httpd/perl/startup.pl

  In the I<startup.pl> we might have the following code:

    # retrieve the configuration module path
    use Apache;
    my $s             = Apache->server;
    my $base_dir      = $s->dir_config('FooBaseDir')      || '';
    my $config_module = $s->dir_config('FooConfigModule') || '';
    die "FooBaseDir and FooConfigModule aren't set in httpd.conf" 
        unless $base_dir and $config_module;

    # build the real path to the config module
    my $path = "$base_dir/$config_module";
    $path =~ s|::|/|;
    $path .= ".pm";
    # we have something like "/home/httpd/foo/Foo/Config.pm"

    # now we can pull in the configuration module
    require $path;

  Now we know the module name and it's loaded, so for example if we need
  to use some variables stored in this module to open a database
  connection, we will do:

    Apache::DBI->connect_on_init
    ("DBI:mysql:${$config_module.'::DB_NAME'}::${$config_module.'::SERVER'}",
     ${$config_module.'::USER'},
     ${$config_module.'::USER_PASSWD'},
     {
      PrintError => 1, # warn() on errors
      RaiseError => 0, # don't die on error
      AutoCommit => 1, # commit executes immediately
     }
    );

  Where variable like:

    ${$config_module.'::USER'}

  In our example are really:

    $Foo::Config::USER

  If you want to access these variable from within your code at the run
  time, instead accessing to the server object C<$c>, use the request
  object C<$r>:

    my $r = shift;
    my $base_dir      = $r->dir_config('FooBaseDir')      || '';
    my $config_module = $r->dir_config('FooConfigModule') || '';

  =head1 The Scope of the Special Perl Variables

  Special Perl variables like C<$|> (buffering), C<$^T> (script's start
  time), C<$^W> (warnings mode), C<$/> (input record separator), C<$\>
  (output record separator) and many more are all true global variables;
  they do not belong to any particular package (not even C<main::>) and
  are universally available. This means that if you change them, you
  change them anywhere across the entire program; furthermore you cannot
  scope them with my(). However you can local()ise them which means that
  any changes you apply will only last until the end of the enclosing
  scope. In the mod_perl situation where the child server doesn't
  usually exit, if in one of your scripts you modify a global variable
  it will be changed for the rest of the process' life and will affect
  all the scripts executed by the same process. Therefore localizing
  these variables is highly recommended, I'd say mandatory.

  We will demonstrate the case on the input record separator
  variable. If you undefine this variable, the diamond operator
  (readline) will suck in the whole file at once if you have enough
  memory. Remembering this you should never write code like the example
  below.

    $/ = undef; # BAD!
    open IN, "file" ....
      # slurp it all into a variable
    $all_the_file = <IN>;

  The proper way is to have a local() keyword before the special
  variable is changed, like this:

    local $/ = undef; 
    open IN, "file" ....
      # slurp it all inside a variable
    $all_the_file = <IN>;

  But there is a catch. local() will propagate the changed value to 
  the code below it.  The modified value will be in effect until the
  script terminates, unless it is changed again somewhere else in the
  script.

  A cleaner approach is to enclose the whole of the code that is
  affected by the modified variable in a block, like this:

    {
      local $/ = undef; 
      open IN, "file" ....
        # slurp it all inside a variable
      $all_the_file = <IN>;
    }

  That way when Perl leaves the block it restores the original value of
  the C<$/> variable, and you don't need to worry elsewhere in your
  program about its value being changed here.

  Note that if you call a subroutine after you've set a global variable
  but within the enclosing block, the global variable will be visible
  with its new value inside the subroutine.

  =head1 Compiled Regular Expressions 

  When using a regular expression that contains an interpolated Perl
  variable, if it is known that the variable (or variables) will not
  change during the execution of the program, a standard optimization
  technique is to add the C</o> modifier to the regex pattern.  This
  directs the compiler to build the internal table once, for the entire
  lifetime of the script, rather than every time the pattern is
  executed. Consider:

    my $pat = '^foo$'; # likely to be input from an HTML form field
    foreach( @list ) {
      print if /$pat/o;
    }

  This is usually a big win in loops over lists, or when using the
  C<grep()> or C<map()> operators.

  In long-lived mod_perl scripts, however, the variable may change with
  each invocation and this can pose a problem. The first invocation of a
  fresh httpd child will compile the regex and perform the search
  correctly. However, all subsequent uses by that child will continue to
  match the original pattern, regardless of the current contents of the
  Perl variables the pattern is supposed to depend on. Your script will
  appear to be broken.

  There are two solutions to this problem:

  The first is to use C<eval q//>, to force the code to be evaluated
  each time. Just make sure that the eval block covers the entire loop
  of processing, and not just the pattern match itself.

  The above code fragment would be rewritten as: 

    my $pat = '^foo$';
    eval q{
      foreach( @list ) {
        print if /$pat/o;
      }
    }

  Just saying:

    foreach( @list ) {
      eval q{ print if /$pat/o; };
    }

  means that we recompile the regex for every element in the list even
  though the regex doesn't change.

  You can use this approach if you require more than one pattern match
  operator in a given section of code. If the section contains only one
  operator (be it an C<m//> or C<s///>), you can rely on the property of the
  null pattern, that reuses the last pattern seen. This leads to the
  second solution, which also eliminates the use of eval.

  The above code fragment becomes: 

    my $pat = '^foo$';
    "something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
    foreach( @list ) {
      print if //;
    }

  The only gotcha is that the dummy match that boots the regular
  expression engine must absolutely, positively succeed, otherwise the
  pattern will not be cached, and the C<//> will match everything. If you
  can't count on fixed text to ensure the match succeeds, you have two
  possibilities.

  If you can guarantee that the pattern variable contains no
  meta-characters (things like *, +, ^, $...), you can use the dummy
  match:

    $pat =~ /\Q$pat\E/; # guaranteed if no meta-characters present

  If there is a possibility that the pattern can contain
  meta-characters, you should search for the pattern or the non-searchable
  \377 character as follows:

    "\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present

  Another approach:

  It depends on the complexity of the regex to which you apply this
  technique.  One common usage where a compiled regex is usually more
  efficient is to "I<match any one of a group of patterns>" over and
  over again.

  Maybe with a helper routine, it's easier to remember.  Here is one
  slightly modified from Jeffery Friedl's example in his book
  "I<Mastering Regular Expressions>".

    #####################################################
    # Build_MatchMany_Function
    # -- Input:  list of patterns
    # -- Output: A code ref which matches its $_[0]
    #            against ANY of the patterns given in the
    #            "Input", efficiently.
    #
    sub Build_MatchMany_Function {
      my @R = @_;
      my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
      my $matchsub = eval "sub { $expr }";
      die "Failed in building regex @R: $@" if $@;
      $matchsub;
    }

  Example usage:

    @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
    $Known_Browser=Build_MatchMany_Function(@some_browsers);

    while (<ACCESS_LOG>) {
      # ...
      $browser = get_browser_field($_);
      if ( ! &$Known_Browser($browser) ) {
        print STDERR "Unknown Browser: $browser\n";
      }
      # ...
    }

  And of course you can use the qr() operator which makes the code even
  more efficient:

    my $pat = '^foo$';
    my $re  = qr($pat);
    foreach( @list ) {
        print if /$re/o;
      }

  The qr() operator compiles the pattern for each request and then use
  the compiled version in the actual match.

  =head1 Exception Handling for mod_perl

  Here are some guidelines for S<clean(er)> exception handling in
  mod_perl, although the technique presented can be applied to all of
  your Perl programming.

  The reasoning behind this document is the current broken status of
  C<$SIG{__DIE__}> in the perl core - see both the perl5-porters and the
  mod_perl mailing list archives for details on this discussion. (It's
  broken in at least Perl v5.6.0 and probably in later versions as
  well). In short summary, $SIG{__DIE__} is a little bit too global, and
  catches exceptions even when you want to catch them yourself, using
  an C<eval{}> block.

  =head2 Trapping Exceptions in Perl

  To trap an exception in Perl we use the C<eval{}> construct. Many
  people initially make the mistake that this is the same as the C<eval
  EXPR> construct, which compiles and executes code at run time, but
  that's not the case. C<eval{}> compiles at compile time, just like the
  rest of your code, and has next to zero run-time penalty. For the
  hardcore C programmers among you, it uses the C<setjmp/longjmp> POSIX
  routines internally, just like C++ exceptions.

  When in an eval block, if the code being executed die()'s for any
  reason, an exception is thrown. This exception can be caught by
  examining the C<$@> variable immediately after the eval block; if
  C<$@> is true then an exception occurred and C<$@> contains the
  exception in the form of a string.  The full construct looks like
  this:

    eval {
        # Some code here
    }; # Note important semi-colon there
    if ($@) # $@ contains the exception that was thrown
    {
        # Do something with the exception
    }
    else # optional
    {
        # No exception was thrown
    }

  Most of the time when you see these exception handlers there is no
  else block, because it tends to be OK if the code didn't throw an
  exception.

  Perl's exception handling is similar to that of other languages, though it may
  not seem so at first sight:

    Perl                             Other language
    -------------------------------  ------------------------------------
    eval {                           try {
      # execute here                   // execute here
      # raise our own exception:       // raise our own exception:
      die "Oops" if /error/;           if(error==1){throw Exception.Oops;}
      # execute more                   // execute more
    } ;                              }
    if($@) {                         catch {
      # handle exceptions              switch( Exception.id ) {
      if( $@ =~ /Fail/ ) {               Fail : fprintf( stderr, "Failed\n" ) ;
          print "Failed\n" ;                    break ;
      }
      elsif( $@ =~ /Oops/ ) {            Oops : throw Exception ;
          # Pass it up the chain                 
          die if $@ =~ /Oops/;
      }
      else {                             default :
          # handle all other           }
          # exceptions here          }
      }                              // If we got here all is OK or handled
    }
    else { # optional
      # all is well
    }
    # all is well or has been handled

  =head2 Alternative Exception Handling Techniques

  An often suggested method for handling global exceptions in mod_perl,
  and other perl programs in general, is a B<__DIE__> handler, which can
  be set up by either assigning a function name as a string to
  C<$SIG{__DIE__}> (not particularly recommended, because of the
  possible namespace clashes) or assigning a code reference to
  C<$SIG{__DIE__}>. The usual way of doing so is to use an anonymous
  subroutine:

    $SIG{__DIE__} = sub { print "Eek - we died with:\n", $_[0]; };

  The current problem with this is that C<$SIG{__DIE__}> is a global
  setting in your script, so while you can potentially hide away your
  exceptions in some external module, the execution of C<$SIG{__DIE__}>
  is fairly magical, and interferes not just with your code, but with
  all code in every module you import. Beyond the magic involved,
  C<$SIG{__DIE__}> actually interferes with perl's normal exception
  handling mechanism, the C<eval{}> construct. Witness:

    $SIG{__DIE__} = sub { print "handler\n"; };

    eval {
        print "In eval\n";
        die "Failed for some reason\n";
    };
    if ($@) {
        print "Caught exception: $@";
    }

  The code unfortunately prints out:

    In eval
    handler

  Which isn't quite what you would expect, especially if that
  C<$SIG{__DIE__}> handler is hidden away deep in some other module that
  you didn't know about. There are work arounds however. One is to
  localize C<$SIG{__DIE__}> in every exception trap you write:

    eval {
        local $SIG{__DIE__};
        ...
    };

  Obviously this just doesn't scale - you don't want to be doing that
  for every exception trap in your code, and it's a slow down. A second
  work around is to check in your handler if you are trying to catch
  this exception:

    $SIG{__DIE__} = sub {
        die $_[0] if $^S;
        print "handler\n";
    };

  However this won't work under C<Apache::Registry> - you're always in
  an eval block there!

  C<$^S> isn't totally reliable in certain Perl versions.  e.g. 5.005_03
  and 5.6.1 both do the wrong thing with it in certain situations.
  Instead, you use can use the caller() function to figure out if we are
  called in the eval() context:

    $SIG{__DIE__} = sub {
        my $in_eval = 0;
        for(my $stack = 1;  my $sub = (CORE::caller($stack))[3];  $stack++) {
            $in_eval = 1 if $sub =~ /^\(eval\)/;
        }
        my_die_handler(@_) unless $in_eval;
    };

  The other problem with C<$SIG{__DIE__}> also relates to its global
  nature.  Because you might have more than one application running
  under mod_perl, you can't be sure which has set a C<$SIG{__DIE__}>
  handler when and for what. This can become extremely confusing when
  you start scaling up from a set of simple registry scripts that might
  rely on CGI::Carp for global exception handling (which uses
  C<$SIG{__DIE__}> to trap exceptions) to having many applications
  installed with a variety of exception handling mechanisms in place.

  You should warn people about this danger of C<$SIG{__DIE__}> and
  inform them of better ways to code. The following material is an
  attempt to do just that.

  =head2 Better Exception Handling

  The C<eval{}> construct in itself is a fairly weak way to handle
  exceptions as strings. There's no way to pass more information in your
  exception, so you have to handle your exception in more than one place
  - at the location the error occurred, in order to construct a sensible
  error message, and again in your exception handler to de-construct
  that string into something meaningful (unless of course all you want
  your exception handler to do is dump the error to the browser). The
  other problem is that you have no way of automatically detecting where
  the exception occurred using C<eval{}> construct. In a
  C<$SIG{__DIE__}> block you always have the use of the caller()
  function to detect where the error occurred. But we can fix that...

  A little known fact about exceptions in perl 5.005 is that you can
  call die with an object. The exception handler receives that object in
  C<$@>. This is how you are advised to handle exceptions now, as it
  provides an extremely flexible and scalable exceptions solution,
  potentially providing almost all of the power Java exceptions.

  [As a footnote here, the only thing that is really missing here from
  Java exceptions is a guaranteed Finally clause, although its possible
  to get about 98.62% of the way towards providing that using
  C<eval{}>.]

  =head3 A Little Housekeeping

  First though, before we delve into the details, a little housekeeping
  is in order. Most, if not all, mod_perl programs consist of a main
  routine that is entered, and then dispatches itself to a routine
  depending on the parameters passed and/or the form values. In a normal
  C program this is your main() function, in a mod_perl handler this is
  your handler() function/method. The exception to this rule seems to be
  Apache::Registry scripts, although the techniques described here can
  be easily adapted.

  In order for you to be able to use exception handling to its best
  advantage you need to change your script to have some sort of global
  exception handling. This is much more trivial than it sounds. If
  you're using C<Apache::Registry> to emulate CGI you might consider
  wrapping your entire script in one big eval block, but I would
  discourage that. A better method would be to modularize your script
  into discrete function calls, one of which should be a dispatch
  routine:

    #!/usr/bin/perl -w
    # Apache::Registry script

    eval {
       dispatch();
    };
    if ($@) {
       # handle exception
    }

    sub dispatch {
        ...
    }

  This is easier with an ordinary mod_perl handler as it is natural to
  have separate functions, rather than a long run-on script:

    MyHandler.pm
    ------------
    sub handler {
        my $r = shift;

        eval {
           dispatch($r);
        };
        if ($@) {
           # handle exception
        }
    }

    sub dispatch {
        my $r = shift;
        ...
    }

  Now that the skeleton code is setup, let's create an exception class,
  making use of Perl 5.005's ability to throw exception objects.

  =head3 An Exception Class

  This is a really simple exception class, that does nothing but contain
  information. A better implementation would probably also handle its
  own exception conditions, but that would be more complex, requiring
  separate packages for each exception type.

    My/Exception.pm
    ---------------
    package My::Exception;

    sub AUTOLOAD {
        no strict 'refs', 'subs';
        if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) {
            my $exception = $1;
            *{$AUTOLOAD} = 
                sub {
                    shift;
                    my ($package, $filename, $line) = caller;
                    push @_, caller => {
                                    package => $package,
                                    filename => $filename,
                                    line => $line,
                                      };
                    bless { @_ }, "My::Exception::$exception"; 
                };
            goto &{$AUTOLOAD};
        }
        else {
            die "No such exception class: $AUTOLOAD\n";
        }
    }

    1;

  OK, so this is all highly magical, but what does it do? It creates a
  simple package that we can import and use as follows:

    use My::Exception;

    die My::Exception->SomeException( foo => "bar" );

  The exception class tracks exactly where we died from using the
  caller() mechanism, it also caches exception classes so that
  C<AUTOLOAD> is only called the first time (in a given process) an
  exception of a particular type is thrown (particularly relevant under
  mod_perl).

  =head2 Catching Uncaught Exceptions

  What about exceptions that are thrown outside of your control? We can
  fix this using one of two possible methods. The first is to override
  die globally using the old magical C<$SIG{__DIE__}>, and the second,
  is the cleaner non-magical method of overriding the global die()
  method to your own die() method that throws an exception that makes
  sense to your application.

  =head3 Using $SIG{__DIE__}

  Overloading using C<$SIG{__DIE__}> in this case is rather simple,
  here's some code:

    $SIG{__DIE__} = sub {
        if(!ref($_[0])) {
            $err = My::Exception->UnCaught(text => join('', @_));
        }
        die $err;
    };

  All this does is catch your exception and re-throw it. It's not as
  dangerous as we stated earlier that C<$SIG{__DIE__}> can be, because
  we're actually re-throwing the exception, rather than catching it and
  stopping there. Even though $SIG{__DIE__} is a global handler, because
  we are simply re-throwing the exception we can let other applications
  outside of our control simply catch the exception and not worry about
  it.

  There's only one slight buggette left, and that's if some external
  code die()'ing catches the exception and tries to do string
  comparisons on the exception, as in:

    eval {
        ... # some code
        die "FATAL ERROR!\n";
    };
    if ($@) {
        if ($@ =~ /^FATAL ERROR/) {
            die $@;
        }
    }

  In order to deal with this, we can overload stringification for our
  C<My::Exception::UnCaught> class:

    {
        package My::Exception::UnCaught;
        use overload '""' => \&str;

        sub str {
            shift->{text};
        }
    }

  We can now let other code happily continue. Note that there is a bug in
  Perl 5.6 which may affect people here: Stringification does not occur
  when an object is operated on by a regular expression (via the =~ operator).
  A work around is to explicitly stringify using qq double quotes, however
  that doesn't help the poor soul who is using other applications. This bug
  has been fixed in later versions of Perl.

  =head3 Overriding the Core die() Function

  So what if we don't want to touch C<$SIG{__DIE__}> at all? We can
  overcome this by overriding the core die function. This is slightly
  more complex than implementing a C<$SIG{__DIE__}> handler, but is far
  less magical, and is the right thing to do, according to the
  L<perl5-porters mailing list|guide::help/Get_help_with_Perl>.

  Overriding core functions has to be done from an external
  package/module. So we're going to add that to our C<My::Exception>
  module. Here's the relevant parts:

    use vars qw/@ISA @EXPORT/;
    use Exporter;

    @EXPORT = qw/die/;
    @ISA = 'Exporter';

    sub die (@); # prototype to match CORE::die

    sub import {
        my $pkg = shift;
        $pkg->export('CORE::GLOBAL', 'die');
        Exporter::import($pkg,@_);
    }

    sub die (@) {
        if (!ref($_[0])) {
            CORE::die My::Exception->UnCaught(text => join('', @_));
        }
        CORE::die $_[0]; # only use first element because its an object
    }

  That wasn't so bad, was it? We're relying on Exporter's export()
  function to do the hard work for us, exporting the die() function into
  the C<CORE::GLOBAL> namespace. If we don't want to overload die() everywhere
  this can still be an extremely useful technique. By just using Exporter's
  default import() method we can export our new die() method into any package
  of our choosing. This allows us to short-cut the long calling convention
  and simply die() with a string, and let the system handle the actual 
  construction into an object for us.

  Along with the above overloaded stringification, we now have a complete
  exception system (well, mostly complete. Exception die-hards would argue that
  there's no "finally" clause, and no exception stack, but that's another topic
  for another time).

  =head2 A Single UnCaught Exception Class

  Until the Perl core gets its own base exception class (which will likely 
happen
  for Perl 6, but not sooner), it is vitally important that you decide upon a
  single base exception class for all of the applications that you install on
  your server, and a single exception handling technique. The problem comes when
  you have multiple applications all doing exception handling and all expecting 
a
  certain type of "UnCaught" exception class. Witness the following application:

    package Foo;

    eval {
       # do something
    }
    if ($@) {
       if ([EMAIL PROTECTED]>isa('Foo::Exception::Bar')) {
          # handle "Bar" exception
       }
       elsif ([EMAIL PROTECTED]>isa('Foo::Exception::UnCaught')) {
          # handle uncaught exceptions
       }
    }

  All will work well until someone installs application "TrapMe" on the
  same machine, which installs its own UnCaught exception handler, 
  overloading CORE::GLOBAL::die or installing a $SIG{__DIE__} handler.
  This is actually a case where using $SIG{__DIE__} might actually be
  preferable, because you can change your handler() routine to look like
  this:

    sub handler {
        my $r = shift;

        local $SIG{__DIE__};
        Foo::Exception->Init(); # sets $SIG{__DIE__}

        eval {
           dispatch($r);
        };
        if ($@) {
           # handle exception
        }
    }

    sub dispatch {
        my $r = shift;
        ...
    }

  In this case the very nature of $SIG{__DIE__} being a lexical variable
  has helped us, something we couldn't fix with overloading 
  CORE::GLOBAL::die. However there is still a gotcha. If someone has
  overloaded die() in one of the applications installed on your mod_perl
  machine, you get the same problems still. So in short: Watch out, and
  check the source code of anything you install to make sure it follows
  your exception handling technique, or just uses die() with strings.

  =head2 Some Uses

  I'm going to come right out and say now: I abuse this system horribly!
  I throw exceptions all over my code, not because I've hit an
  "exceptional" bit of code, but because I want to get straight back out
  of the current call stack, without having to have every single level of
  function call check error codes. One way I use this is to return
  Apache return codes:

    # paranoid security check
    die My::Exception->RetCode(code => 204);

  Returns a 204 error code (C<HTTP_NO_CONTENT>), which is caught at my
  top level exception handler:

    if ([EMAIL PROTECTED]>isa('My::Exception::RetCode')) {
        return [EMAIL PROTECTED]>{code};
    }

  That last return statement is in my handler() method, so that's the
  return code that Apache actually sends. I have other exception
  handlers in place for sending Basic Authentication headers and
  Redirect headers out. I also have a generic C<My::Exception::OK>
  class, which gives me a way to back out completely from where I am,
  but register that as an OK thing to do.

  Why do I go to these extents? After all, code like slashcode (the code
  behind http://slashdot.org) doesn't need this sort of thing, so why
  should my web site? Well it's just a matter of scalability and
  programmer style really. There's a lot of literature out there about
  exception handling, so I suggest doing some research.

  =head2 Conclusions

  Here I've demonstrated a simple and scalable (and useful) exception
  handling mechanism, that fits perfectly with your current code, and
  provides the programmer with an excellent means to determine what has
  happened in his code. Some users might be worried about the overhead
  of such code. However in use I've found accessing the database to be a
  much more significant overhead, and this is used in some code
  delivering to thousands of users.

  For similar exception handling techniques, see the section "L<Other

Implementations|general::perl_reference::perl_reference/Other_Implementations>".

  =head2 The My::Exception class in its entirety

    package My::Exception;

    use vars qw/@ISA @EXPORT $AUTOLOAD/;
    use Exporter;
    @ISA = 'Exporter';
    @EXPORT = qw/die/;

    sub die (@);

    sub import {
        my $pkg = shift;
        # allow "use My::Exception 'die';" to mean import locally only
        $pkg->export('CORE::GLOBAL', 'die') unless @_;
        Exporter::import($pkg,@_);
    }

    sub die (@) {
        if (!ref($_[0])) {
            CORE::die My::Exception->UnCaught(text => join('', @_));
        }
        CORE::die $_[0];
    }

    {
        package My::Exception::UnCaught;
        use overload '""' => sub { shift->{text} } ; 
    }

    sub AUTOLOAD {
        no strict 'refs', 'subs';
        if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) {
            my $exception = $1;
            *{$AUTOLOAD} = 
                sub {
                    shift;
                    my ($package, $filename, $line) = caller;
                    push @_, caller => {
                                    package => $package,
                                    filename => $filename,
                                    line => $line,
                                        };
                    bless { @_ }, "My::Exception::$exception"; 
                };
            goto &{$AUTOLOAD};
        }
        else {
            CORE::die "No such exception class: $AUTOLOAD\n";
        }
    }

    1;

  =head2 Other Implementations

  Some users might find it very useful to have the more C++/Java like
  interface of try/catch functions. These are available in several forms
  that all work in slightly different ways. See the documentation for
  each module for details:

  =over 

  =item * Error.pm

  Graham Barr's excellent OO styled "try, throw, catch" module (from
  L<CPAN|download::third_party/Perl>). This should be considered your best 
option
  for structured exception handling because it is well known and well
  supported and used by a lot of other applications.

  =item * Exception::Class and Devel::StackTrace

  by Dave Rolsky both available from CPAN of course.

  C<Exception::Class> is a bit cleaner than the C<AUTOLOAD> method from
  above as it can catch typos in exception class names, whereas the
  method above will automatically create a new class for you.  In
  addition, it lets you create actual class hierarchies for your
  exceptions, which can be useful if you want to create exception
  classes that provide extra methods or data.  For example, an exception
  class for database errors could provide a method for returning the SQL
  and bound parameters in use at the time of the error.

  =item * Try.pm

  Tony Olekshy's. Adds an unwind stack and some other interesting
  features.  Not on the CPAN. Available at
  http://www.avrasoft.com/perl/rfc/try-1136.zip

  =back

  =head1 Customized __DIE__ hanlder

  As we saw in the previous sections it's a bad idea to do:

    require Carp;
    $SIG{__DIE__} = \&Carp::confess;

  since it breaks the error propogations within eval {} blocks,. But
  starting from perl 5.6.x you can use another solution to trace
  errors. For example you get an error:

    "exit" is not exported by the GLOB(0x88414cc) module at (eval 397) line 1

  and you have no clue where it comes from, you can override the exit()
  function and plug the tracer inside:

    require Carp;
    use subs qw(CORE::GLOBAL::die);
    *CORE::GLOBAL::die = sub {
        if ($_[0] =~ /"exit" is not exported/){
            local *CORE::GLOBAL::die = sub { CORE::die(@_) };
            Carp::confess(@_); # Carp uses die() internally!
        } else {
            CORE::die(@_); # could write &CORE::die to forward @_
        }
    };

  Now we can test that it works properly without breaking the eval {}
  blocks error propogation:

    eval { foo(); }; warn $@ if $@;
    print "\n";
    eval { poo(); }; warn $@ if $@;

    sub foo{ bar(); }
    sub bar{ die qq{"exit" is not exported}}

    sub poo{ tar(); }
    sub tar{ die "normal exit"}

  prints:

    $ perl -w test
    Subroutine die redefined at test line 5.
    "exit" is not exported at test line 6
        main::__ANON__('"exit" is not exported') called at test line 17
        main::bar() called at test line 16
        main::foo() called at test line 12
        eval {...} called at test line 12

    normal exit at test line 5.

  the 'local' in:

    local *CORE::GLOBAL::die = sub { CORE::die(@_) };

  is important, so you won't lose the overloaded C<CORE::GLOBAL::die>.

  =head1 Maintainers

  Maintainer is the person(s) you should contact with updates,
  corrections and patches.

  =over

  =item *

  Stas Bekman E<lt>stas (at) stason.orgE<gt>

  =back

  =head1 Authors

  =over

  =item *

  Stas Bekman E<lt>stas (at) stason.orgE<gt>

  =item *

  Matt Sergeant E<lt>matt (at) sergeant.orgE<gt>

  =back

  Only the major authors are listed above. For contributors see the
  Changes file.

  =cut


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

cvs commit: modperl-docs/src/docs/general/perl_reference perl_reference.pod

Reply via email to