general .cvsignore advocacy.pod control.pod hardware.pod multiuser.pod perl_myth.pod perl_reference.pod config.cfg cvs_howto.pod

stas 10 May 2002 07:45:13 -0000

stas        02/05/10 00:45:11

  Modified:    src/docs/general config.cfg cvs_howto.pod
  Added:       src/docs/general .cvsignore advocacy.pod control.pod
                        hardware.pod multiuser.pod perl_myth.pod
                        perl_reference.pod
  Log:
  docs common to all mod_perl versions
  Submitted by: Thomas Klausner <[EMAIL PROTECTED]>
  
  Revision  Changes    Path
  1.2       +9 -2      modperl-docs/src/docs/general/config.cfg
  
  Index: config.cfg
  ===================================================================
  RCS file: /home/cvs/modperl-docs/src/docs/general/config.cfg,v
  retrieving revision 1.1
  retrieving revision 1.2
  diff -u -r1.1 -r1.2
  --- config.cfg        29 Apr 2002 16:48:06 -0000      1.1
  +++ config.cfg        10 May 2002 07:45:11 -0000      1.2
  @@ -6,11 +6,18 @@
       title => "General Documentation",
   
       abstract => <<EOB,
  -Here you can find documentation not directly concerned with mod_perl,
  -but still usefull for most mod_perl projects.
  +Here you can find documentation concerning mod_perl in general,
  +but also not strictly mod_perl related information that is still
  +very usefull for working with mod_perl.
   EOB
   
       chapters => [qw(
  +        perl_reference.pod
  +        multiuser.pod
  +        hardware.pod
  +        control.pod
  +        advocacy.pod
  +        perl_myth.pod
           cvs_howto.pod
           Changes.pod
       )],
  
  
  
  1.3       +1 -1      modperl-docs/src/docs/general/cvs_howto.pod
  
  Index: cvs_howto.pod
  ===================================================================
  RCS file: /home/cvs/modperl-docs/src/docs/general/cvs_howto.pod,v
  retrieving revision 1.2
  retrieving revision 1.3
  diff -u -r1.2 -r1.3
  --- cvs_howto.pod     29 Apr 2002 17:08:17 -0000      1.2
  +++ cvs_howto.pod     10 May 2002 07:45:11 -0000      1.3
  @@ -36,7 +36,7 @@
     % cvs -d ":pserver:[EMAIL PROTECTED]:/home/cvspublic" co modperl
   
   After cvs finished downloading the files you will find a new directory
  -calles I<modperl> in the current working directory.
  +called I<modperl> in the current working directory.
   
   =head2 keeping your copy up to date
   
  
  
  
  1.1                  modperl-docs/src/docs/general/.cvsignore
  
  Index: .cvsignore
  ===================================================================
  cache.*.dat
  
  
  1.1                  modperl-docs/src/docs/general/advocacy.pod
  
  Index: advocacy.pod
  ===================================================================
  =head1 NAME
  
  mod_perl Advocacy
  
  =head1 Description
  
  Having a hard time getting mod_perl into your organization? We have
  collected some arguments you can use to convince your boss why the
  organization wants mod_perl.
  
  You can contact the L<mod_perl advocacy list|maillist::list-advocacy>
  if you have any more questions, or good arguments you have used (any
  success-stories are also welcome to L<the docs-dev
  list|maillist::list-docs-dev>).
  
  Also see L<Popular Perl Complaints and Myths|docs::general::perl_myth>.
  
  =head1 Thoughts about scalability and flexibility
  
  Your need for scalability and flexibility depends on what you need
  from your web site.  If you only want a simple guest book or database
  gateway with no feature headroom, you can get away with any
  EASY_AND_FAST_TO_DEVELOP_TOOL (Exchange, MS IIS, Lotus Notes, etc).
  
  Experience shows that you will soon want more functionality, at which
  point you'll discover the limitations of these "easy" tools.
  Gradually, your boss will ask for increasing functionality and at some
  point you'll realize that the tool lacks flexibility and/or
  scalability.  Then your boss will either buy another
  EASY_AND_FAST_TO_DEVELOP_WITH_TOOLS and repeat the process (with
  different unforseen problems), or you'll start investing time in
  learning how to use a powerful, flexible tool to make the long-term
  development cycle easier.
  
  If you and your company are serious about delivering flexible Internet
  functionality, do your homework.  Then urge your boss to invest a
  little extra time and resources in choosing the right tool for the
  job.  The extra quality and manageability of your site along with your
  ability to deliver new and improved functionality of high quality and
  in good time will prove the superiority of using solid flexible tools.
  
  =head1 The boss, the developer and advocacy
  
  Each developer has a boss who participates in the decision-making
  process.  Remember that the boss considers input from sales people,
  developers, the media and associates before handing down large
  decisions.  Of course, results count!  A sales brochure makes very
  little impact compared to a working demonstration, and demonstrations
  of company-specific and developer-specific results count for a lot!
  
  Personally, when I discovered mod_perl I did a lot of testing and
  coding at home and at work. Once I had a working heavy application, I
  came to my boss with two URLs - one for the plain CGI server and the
  other for the mod_perl-enabled server. It took about 30 secs for my
  boss to say: `Go with it'.  Of course since then I have had to provide
  all the support for other developers, which is why I took time to
  learn it in first place (and why this guide was created!).
  
  Chances are that if you've done your homework, learnt the tools and
  can deliver results, you'll have a successful project.  If you
  convince your boss to try a tool that you don't know very well, your
  results may suffer.  If your boss follows your development process
  closely and sees that your progress is much worse than expected, you
  might be told to "forget it" and mod_perl might not get a second
  chance.
  
  Advocacy is a great thing for the open-source software movement, but
  it's best done quietly until you have confidence that you can show
  productivity.  If you can demonstrate to your boss a heavy CGI which
  is running much faster under mod_perl, that may be a strong argument
  for further evaluation.  Your company may even sponsor a portion of
  your learning process.
  
  Learn the technology by working on sample projects.  Learn how to
  support yourself and learn how to get support from the community; then
  advocate your ideas to your boss.  Then you'll have the knowledge;
  your company will have the benefit; and mod_perl will have the
  reputation it deserves.
  
  =head1 A summary of perl/CGI discussion at slashdot.org
  
  Well, there was a nice discussion of merits of Perl in CGI world. 
  I took the time to summarize this thread, so here is what I've got:
  
  Perl Domination in CGI Programming?
  http://slashdot.org/askslashdot/99/10/20/1246241.shtml
  
  =over 4
  
  =item *
  
  Perl is cool and fun to code with.
  
  =item *
  
  Perl is very fast to develop with.
  
  =item *
  
  Perl is even faster to develop with if you know what CPAN is. :)
  
  =item *
  
  Math intensive code and other stuff which is faster in C/C++, can be
  plugged into Perl with XS/SWIG and may be used transparently by Perl
  programmers.
  
  =item *
  
  Most CGI applications do text processing, at which Perl excels
  
  =item *
  
  Forking and loading (unless the code is shared) of C/C++ CGI programs
  produces an overhead.
  
  =item *
  
  Except for Intranets, bandwidth is usually a bigger bottleneck than 
  Perl performance, although this might change in the future.
  
  =item *
  
  For database driven applications, the database itself is a bottleneck.  
  Lots of posts talk about latency vs throughput.
  
  =item *
  
  mod_perl, FastCGI, Velocigen and PerlEx all give good performance
  gains over plain mod_cgi.
  
  =item *
  
  Other light alternatives to Perl and its derivatives which have
  been mentioned: PHP, Python.
  
  =item *
  
  There were almost no voices from users of M$ and similar technologies,
  I guess that's because they don't read http://slashdot.org :)
  
  =item *
  
  Many said that in many people's minds: 'CGI' eq 'Perl'
  
  =back
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  
  =head1 Authors
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  Only the major authors are listed above. For contributors see the
  Changes file.
  
  
  =cut
  
  
  
  
  1.1                  modperl-docs/src/docs/general/control.pod
  
  Index: control.pod
  ===================================================================
  =head1 NAME
  
  Controlling and Monitoring the Server
  
  =head1 Description
  
  Covers techniques to restart mod_perl enabled Apache, SUID scripts,
  monitoring, and other maintenance chores, as well as some specific
  setups.
  
  =head1 Restarting Techniques
  
  All of these techniques require that you know the server process id
  (PID).  The easiest way to find the PID is to look it up in the
  I<httpd.pid> file.  It's easy to discover where to look, by looking in
  the I<httpd.conf> file.  Open the file and locate the entry
  C<PidFile>.  Here is the line from one of my own I<httpd.conf> files:
  
    PidFile /usr/local/var/httpd_perl/run/httpd.pid
  
  As you see, with my configuration the file is
  I</usr/local/var/httpd_perl/run/httpd.pid>.
  
  Another way is to use the C<ps> and C<grep> utilities. Assuming that
  the binary is called I<httpd_perl>, we would do:
  
    % ps auxc | grep httpd_perl
  
  or maybe:
  
    % ps -ef | grep httpd_perl
  
  This will produce a list of all the C<httpd_perl> (parent and
  children) processes.  You are looking for the parent process. If you
  run your server as root, you will easily locate it since it belongs to
  root. If you run the server as some other user (when you L<don't have
  root access|guide::install/Installation_Without_Superuser_Privileges>,
  the processes will belong to that user unless defined differently in
  I<httpd.conf>.  It's still easy to find which is the parent--usually
  it's the process with the smallest PID.
  
  You will see several C<httpd> processes running on your system, but you
  should never need to send signals to any of them except the parent,
  whose pid is in the I<PidFile>.  There are three signals that you can
  send to the parent: C<SIGTERM>, C<SIGHUP>, and C<SIGUSR1>.
  
  Some folks prefer to specify signals using numerical values, rather
  than using symbols.  If you are looking for these, check out your
  C<kill(1)> man page.  My page points to
  I</usr/include/linux/signal.h>, the relevant entries are:
  
    #define SIGHUP     1    /* hangup, generated when terminal disconnects */ 
    #define SIGKILL    9    /* last resort */
    #define SIGTERM   15    /* software termination signal */
    #define SIGUSR1   30    /* user defined signal 1 */
  
  Note that to send these signals from the command line the C<SIG> prefix must
  be omitted and under some operating systems they will need to be preceded by
  a minus sign, e.g. C<kill -15> or C<kill -TERM> followed by the PID.
  
  =head1 Server Stopping and Restarting
  
  We will concentrate here on the implications of sending C<TERM>,
  C<HUP>, and C<USR1> signals (as arguments to kill(1)) to a mod_perl
  enabled server.  See http://www.apache.org/docs/stopping.html for
  documentation on the implications of sending these signals to a plain
  Apache server.
  
  =over 4
  
  =item TERM Signal: Stop Now
  
  Sending the C<TERM> signal to the parent causes it to immediately
  attempt to kill off all its children.  Any requests in progress are
  terminated, and no further requests are served.  This process may take
  quite a few seconds to complete.  To stop a child, the parent sends it
  a C<SIGHUP> signal.  If that fails it sends another.  If that fails it
  sends the C<SIGTERM> signal, and as a last resort it sends the
  C<SIGKILL> signal.  For each failed attempt to kill a child it makes
  an entry in the I<error_log>.
  
  When all the child processes were terminated, the parent itself exits
  and any open log files are closed.  This is when all the accumulated
  C<END> blocks, apart from the ones located in scripts running under
  C<Apache::Registry> or C<Apache::PerlRun> handlers.  In the latter
  case, C<END> blocks are executed after each request is served.
  
  =item HUP Signal: Restart Now
  
  Sending the C<HUP> signal to the parent causes it to kill off its
  children as if the C<TERM> signal had been sent, i.e. any requests in
  progress are terminated; but the parent does not exit.  Instead, the
  parent re-reads its configuration files, spawns a new set of child
  processes and continues to serve requests.  It is almost equivalent to
  stopping and then restarting the server.
  
  If the configuration files contain errors when restart is signaled,
  the parent will exit, so it is important to check the configuration
  files for errors before issuing a restart. How to perform the check
  will be covered shortly;
  
  Sometimes using this approach to restart mod_perl enabled Apache may
  cause the processes memory incremental growth after each restart. This
  happens when Perl code loaded in memory is not completely torn down,
  leading to a memory leak.
  
  =item USR1 Signal: Gracefully Restart Now
  
  The C<USR1> signal causes the parent process to advise the children to
  exit after serving their current requests, or to exit immediately if
  they're not serving a request.  The parent re-reads its configuration
  files and re-opens its log files.  As each child dies off the parent
  replaces it with a child from the new generation (the new children use
  the new configuration) and it begins serving new requests immediately.
  
  The only difference between C<USR1> and C<HUP> is that C<USR1> allows
  the children to complete any current requests prior to killing them
  off and there is no interruption in the services compared to the
  killing with C<HUP> signal, where it might take a few seconds for a
  restart to get completed and there is no real service at this time.
  
  =back
  
  By default, if a server is restarted (using C<kill -USR1 `cat
  logs/httpd.pid`> or with the C<HUP> signal), Perl scripts and modules
  are not reloaded.  To reload C<PerlRequire>s, C<PerlModule>s, other
  C<use()>'d modules and flush the C<Apache::Registry> cache, use this
  directive in I<httpd.conf>:
  
    PerlFreshRestart On
  
  Make sure you read L<Evil things might happen when using
  
PerlFreshRestart|guide::troubleshooting/Evil_things_might_happen_when_using_PerlFreshRestart>.
  
  =head1 Speeding up the Apache Termination and Restart
  
  We've already mentioned that restart or termination can sometimes take
  quite a long time, (e.g. tens of seconds), for a mod_perl server.  The
  reason for that is a call to the C<perl_destruct()> Perl API function
  during the child exit phase.  This will cause proper execution of
  C<END> blocks found during server startup and will invoke the
  C<DESTROY> method on global objects which are still alive.
  
  It is also possible that this operation may take a long time to
  finish, causing a long delay during a restart.  Sometimes this will be
  followed by a series of messages appearing in the server I<error_log>
  file, warning that certain child processes did not exit as expected.
  This happens when after a few attempts advising the child process to
  quit, the child is still in the middle of perl_destruct(), and a
  lethal C<KILL> signal is sent, aborting any operation the child has
  happened to execute and I<brutally> killing it.
  
  If your code does not contain any C<END> blocks or C<DESTROY> methods
  which need to be run during child server shutdown, or may have these,
  but it's insignificant to execute them, this destruction can be
  avoided by setting the C<PERL_DESTRUCT_LEVEL> environment variable to
  C<-1>. For example add this setting to the I<httpd.conf> file:
  
   PerlSetEnv PERL_DESTRUCT_LEVEL -1
  
  What constitutes a significant cleanup?  Any change of state outside
  of the current process that would not be handled by the operating
  system itself.  So committing database transactions and removing the
  lock on some resource are significant operations, but closing an
  ordinary file isn't.
  
  =head1 Using apachectl to Control the Server
  
  The Apache distribution comes with a script to control the server.
  It's called C<apachectl> and it is installed into the same location as
  the httpd executable.  We will assume for the sake of our examples
  that it's in C</usr/local/sbin/httpd_perl/apachectl>:
  
  To start httpd_perl:
  
    % /usr/local/sbin/httpd_perl/apachectl start 
  
  To stop httpd_perl:
  
    % /usr/local/sbin/httpd_perl/apachectl stop
  
  To restart httpd_perl (if it is running, send C<SIGHUP>; if it is not
  already running just start it):
  
    % /usr/local/sbin/httpd_perl/apachectl restart
  
  Do a graceful restart by sending a C<SIGUSR1>, or start if not
  running:
  
    % /usr/local/sbin/httpd_perl/apachectl graceful
  
  To do a configuration test:
  
    % /usr/local/sbin/httpd_perl/apachectl configtest 
  
  Replace C<httpd_perl> with C<httpd_docs> in the above calls to control
  the C<httpd_docs> server.
  
  There are other options for C<apachectl>, use the C<help> option to
  see them all.
  
  It's important to remember that C<apachectl> uses the PID file, which
  is specified by the C<PIDFILE> directive in I<httpd.conf>.  If you
  delete the PID file by hand while the server is running, C<apachectl>
  will be unable to stop or restart the server.
  
  =head1 Safe Code Updates on a Live Production Server
  
  You have prepared a new version of code, uploaded it into a production
  server, restarted it and it doesn't work.  What could be worse than
  that?  You also cannot go back, because you have overwritten the good
  working code.
  
  It's quite easy to prevent it, just don't overwrite the previous working
  files!
  
  Personally I do all updates on the live server with the following
  sequence.  Assume that the server root directory is
  I</home/httpd/perl/rel>.  When I'm about to update the files I create
  a new directory I</home/httpd/perl/beta>, copy the old files from
  I</home/httpd/perl/rel> and update it with the new files.  Then I do
  some last sanity checks (check file permissions are [read+executable],
  and run C<perl -c> on the new modules to make sure there no errors in
  them).  When I think I'm ready I do:
  
    % cd /home/httpd/perl
    % mv rel old && mv beta rel && stop && sleep 3 && restart && err
  
  Let me explain what this does.
  
  Firstly, note that I put all the commands on one line, separated by
  C<&&>, and only then press the C<Enter> key.  As I am working
  remotely, this ensures that if I suddenly lose my connection (sadly
  this happens sometimes) I won't leave the server down if only the
  C<stop> command squeezed in.  C<&&> also ensures that if any command
  fails, the rest won't be executed.  I am using aliases (which I have
  already defined) to make the typing easier:
  
    % alias | grep apachectl
    graceful /usr/local/apache/bin/apachectl graceful
    rehup   /usr/local/apache/sbin/apachectl restart
    restart /usr/local/apache/bin/apachectl restart
    start   /usr/local/apache/bin/apachectl start
    stop    /usr/local/apache/bin/apachectl stop
  
    % alias err
    tail -f /usr/local/apache/logs/error_log
  
  Taking the line apart piece by piece:
  
    mv rel old &&
  
  back up the working directory to I<old>
  
    mv beta rel &&
  
  put the new one in its place
  
    stop &&
  
  stop the server
  
    sleep 3 &&
  
  give it a few seconds to shut down (it might take even longer)
  
    restart &&
  
  C<restart> the server
  
    err
  
  view of the tail of the I<error_log> file in order to see that
  everything is OK
  
  C<apachectl> generates the status messages a little too early
  (e.g. when you issue C<apachectl stop> it says the server has been
  stopped, while in fact it's still running) so don't rely on it, rely
  on the C<error_log> file instead.
  
  Also notice that I use C<restart> and not just C<start>.  I do this
  because of Apache's potentially long stopping times (it depends on
  what you do with it of course!).  If you use C<start> and Apache
  hasn't yet released the port it's listening to, the start would fail
  and C<error_log> would tell you that the port is in use, e.g.:
  
    Address already in use: make_sock: could not bind to port 8080
  
  But if you use C<restart>, it will wait for the server to quit and
  then will cleanly restart it.
  
  Now what happens if the new modules are broken?  First of all, I see
  immediately an indication of the problems reported in the C<error_log>
  file, which I C<tail -f> immediately after a restart command.  If
  there's a problem, I just put everything back as it was before:
  
    % mv rel bad && mv old rel && stop && sleep 3 && restart && err
  
  Usually everything will be fine, and I have had only about 10 seconds
  of downtime, which is pretty good!
  
  =head1 An Intentional Disabling of Live Scripts
  
  What happens if you really must take down the server or disable the
  scripts?  This situation might happen when you need to do some
  maintenance work on your database server.  If you have to take your
  database down then any scripts that use it will fail.
  
  If you do nothing, the user will see either the grey C<An Error has
  happened> message or perhaps a customized error message if you have
  added code to trap and customize the errors.  See L<Redirecting Errors
  to the Client instead of to the
  
error_log|guide::snippets/Redirecting_Errors_to_the_Client_Instead_of_error_log>
  for the latter case.
  
  A much friendlier approach is to confess to your users that you are
  doing some maintenance work and plead for patience, promising (keep
  the promise!) that the service will become fully functional in X
  minutes.  There are a few ways to do this:
  
  The first doesn't require messing with the server.  It works when you
  have to disable a script running under C<Apache::Registry> and relies
  on the fact that it checks whether the file was modified before using
  the cached version.  Obviously it won't work under other handlers
  because these serve the compiled version of the code and don't check
  to see if there was a change in the code on the disk.
  
  So if you want to disable an C<Apache::Registry> script, prepare a
  little script like this:
  
    /home/http/perl/maintenance.pl
    ----------------------------
    #!/usr/bin/perl -Tw
    
    use strict;
    use CGI;
    my $q = new CGI;
    print $q->header, $q->p(
    "Sorry, the service is temporarily down for maintenance. 
     It will be back in ten to fifteen minutes.
     Please, bear with us.
     Thank you!");
  
  So if you now have to disable a script for example
  C</home/http/perl/chat.pl>, just do this:
  
    % mv /home/http/perl/chat.pl /home/http/perl/chat.pl.orig
    % ln -s /home/http/perl/maintenance.pl /home/http/perl/chat.pl
  
  Of course you server configuration should allow symbolic links for
  this trick to work.  Make sure you have the directive
  
    Options FollowSymLinks
  
  in the C<E<lt>LocationE<gt>> or C<E<lt>DirectoryE<gt>> section of your
  I<httpd.conf>.
  
  When you're done, it's easy to restore the previous setup.  Just do
  this:
  
    % mv /home/http/perl/chat.pl.orig /home/http/perl/chat.pl
  
  which overwrites the symbolic link.
  
  Now make sure that the script will have the current timestamp:
  
    % touch /home/http/perl/chat.pl
  
  Apache will automatically detect the change and will use the moved
  script instead.
  
  The second approach is to change the server configuration and
  configure a whole directory to be handled by a C<My::Maintenance>
  handler (which you must write).  For example if you write something
  like this:
  
    My/Maintenance.pm
    ------------------
    package My::Maintenance;
    use strict;
    use Apache::Constants qw(:common);
    sub handler {
      my $r = shift;
      print $r->send_http_header("text/plain");
      print qq{
        We apologize, but this service is temporarily stopped for
        maintenance.  It will be back in ten to fifteen minutes.  
        Please, bear with us.  Thank you!
      };
      return OK;
    }
    1;
  
  and put it in a directory that is in the server's C<@INC>, to disable all
  the scripts in Location C</perl> you would replace:
  
    <Location /perl>
      SetHandler perl-script
      PerlHandler My::Handler
      [snip]
    </Location>
  
  with
  
    <Location /perl>
      SetHandler perl-script
      PerlHandler My::Maintenance
      [snip]
    </Location>
  
  Now restart the server.  Your users will be happy to go and read
  http://slashdot.org for ten minutes, knowing that you are working on a
  much better version of the service.
  
  If you need to disable a location handled by some module, the second
  approach would work just as well.
  
  =head1 SUID Start-up Scripts
  
  If you want to allow a few people in your team to start and stop the
  server you will have to give them the root password, which is not a
  good thing to do. The less people know the password, the less problems
  are likely to be encountered.  But there is an easy solution for this
  problem available on UNIX platforms.  It's called a setuid executable.
  
  =head2 Introduction to SUID Executables
  
  The setuid executable has a setuid permissions bit set. This sets the
  process's effective user ID to that of the file upon execution. You
  perform this setting with the following command:
  
    % chmod u+s filename
  
  You probably have used setuid executables before without even knowing
  about it. For example when you change your password you execute the
  C<passwd> utility, which among other things modifies the
  I</etc/passwd> file. In order to change this file you need root
  permissions, the C<passwd> utility has the setuid bit set, therefore
  when you execute this utility, its effective ID is the same of the
  root user ID.
  
  You should avoid using setuid executables as a general practice. The
  less setuid executables you have the less likely that someone will
  find a way to break into your system, by exploiting some bug you
  didn't know about.
  
  When the executable is setuid to root, you have to make sure that it
  doesn't have the group and world read and write permissions. If we
  take a look at the C<passwd> utility we will see:
  
    % ls -l /usr/bin/passwd
    -r-s--x--x 1 root root 12244 Feb 8 00:20 /usr/bin/passwd
  
  You achieve this with the following command:
  
    % chmod 4511 filename
  
  The first digit (4) stands for setuid bit, the second digit (5) is a
  compound of read (4) and executable (1) permissions for the user, and
  the third and the fourth digits are setting the executable permissions
  for the group and the world.
  
  =head2 Apache Startup SUID Script's Security
  
  In our case, we want to allow setuid access only to a specific group
  of users, who all belong to the same group. For the sake of our
  example we will use the group named I<apache>. It's important that
  users who aren't root or who don't belong to the I<apache> group will
  not be able to execute this script. Therefore we perform the following
  commands:
  
    % chgrp apache apachectl
    % chmod  4510  apachectl
  
  The execution order is important. If you swap the command execution
  order you will lose the setuid bit.
  
  Now if we look at the file we see:
  
    % ls -l apachectl
    -r-s--x--- 1 root apache 32 May 13 21:52 apachectl
  
  Now we are all set... Almost...
  
  When you start Apache, Apache and Perl modules are being loaded, code
  can be executed. Since all this happens with root effective ID, any
  code executed as if the root user was doing that. You should be very
  careful because while you didn't gave anyone the root password, all
  the users in the I<apache> group have an indirect root access. Which
  means that if Apache loads some module or executes some code that is
  writable by some of these users, users can plant code that will allow
  them to gain a shell access to root account and become a real root.
  
  Of course if you don't trust your team you shouldn't use this solution
  in first place. You can try to check that all the files Apache loads
  aren't writable by anyone but root, but there are too many of them,
  especially in the mod_perl case, where many Perl modules are loaded at
  the server startup.
  
  By the way, don't let all this setuid stuff to confuse you -- when the
  parent process is loaded, the children processes are spawned as
  non-root processes. This section has presented a way to allow non-root
  users to start the server as root user, the rest is exactly the same
  as if you were executing the script as root in first place.
  
  =head2 Sample Apache Startup SUID Script
  
  Now if you are still with us, here is an example of the setuid Apache
  startup script.
  
  Note the line marked C<WORKAROUND>, which fixes an obscure error when
  starting mod_perl enabled Apache by setting the real UID to the
  effective UID.  Without this workaround, a mismatch between the real
  and the effective UID causes Perl to croak on the C<-e> switch.
  
  Note that you must be using a version of Perl that recognizes and
  emulates the suid bits in order for this to work.  This script will do
  different things depending on whether it is named C<start_httpd>,
  C<stop_httpd> or C<restart_httpd>.  You can use symbolic links for
  this purpose.
  
    suid_apache_ctl
    ---------------
    #!/usr/bin/perl -T
     
    # These constants will need to be adjusted.
    $PID_FILE = '/home/www/logs/httpd.pid';
    $HTTPD = '/home/www/httpd -d /home/www';
    
    # These prevent taint warnings while running suid
    $ENV{PATH}='/bin:/usr/bin';
    $ENV{IFS}='';
    
    # This sets the real to the effective ID, and prevents
    # an obscure error when starting apache/mod_perl
    $< = $>; # WORKAROUND
    $( = $) = 0; # set the group to root too
    
    # Do different things depending on our name
    ($name) = $0 =~ m|([^/]+)$|;
    
    if ($name eq 'start_httpd') {
        system $HTTPD and die "Unable to start HTTP";
        print "HTTP started.\n";
        exit 0;
    }
    
    # extract the process id and confirm that it is numeric
    $pid = `cat $PID_FILE`;
    $pid =~ /(\d+)/ or die "PID $pid not numeric";
    $pid = $1;
    
    if ($name eq 'stop_httpd') {
        kill 'TERM',$pid or die "Unable to signal HTTP";
        print "HTTP stopped.\n";
        exit 0;
    }
    
    if ($name eq 'restart_httpd') {
        kill 'HUP',$pid or die "Unable to signal HTTP";
        print "HTTP restarted.\n";
        exit 0;
    }
    
    die "Script must be named start_httpd, stop_httpd, or restart_httpd.\n";
  
  =head1 Preparing for Machine Reboot
  
  When you run your own development box, it's okay to start the
  webserver by hand when you need to.  On a production system it is
  possible that the machine the server is running on will have to be
  rebooted.  When the reboot is completed, who is going to remember to
  start the server?  It's easy to forget this task, and what happens if
  you aren't around when the machine is rebooted?
  
  After the server installation is complete, it's important not to
  forget that you need to put a script to perform the server startup and
  shutdown into the standard system location, for example I</etc/rc.d>
  under RedHat Linux, or I</etc/init.d/apache> under Debian Slink Linux.
  
  This is the directory which contains scripts to start and stop all the
  other daemons.  The directory and file names vary from one Operating
  System (OS) to another, and even between different distributions of
  the same OS.
  
  Generally the simplest solution is to copy the C<apachectl> script to
  your startup directory or create a symbolic link from the startup
  directory to the C<apachectl> script.  You will find C<apachectl> in
  the same directory as the httpd executable after Apache installation.
  If you have more than one Apache server you will need a separate
  script for each one, and of course you will have to rename them so
  that they can co-exist in the same directories.
  
  For example on a RedHat Linux machine with two servers, I have the
  following setup:
  
    /etc/rc.d/init.d/httpd_docs
    /etc/rc.d/init.d/httpd_perl
    /etc/rc.d/rc3.d/S91httpd_docs -> ../init.d/httpd_docs
    /etc/rc.d/rc3.d/S91httpd_perl -> ../init.d/httpd_perl
    /etc/rc.d/rc6.d/K16httpd_docs -> ../init.d/httpd_docs
    /etc/rc.d/rc6.d/K16httpd_perl -> ../init.d/httpd_perl
  
  The scripts themselves reside in the I</etc/rc.d/init.d> directory.
  There are symbolic links to these scripts in other directories. The
  names are the same as the script names but they have numerical
  prefixes, which are used for executing the scripts in a particular
  order: the lower numbers are executed earlier.
  
  When the system starts (level 3) we want the Apache to be started when
  almost all of the services are running already, therefore I've used
  I<S91>. For example if the mod_perl enabled Apache issues a
  C<connect_on_init()> the SQL server should be started before Apache.
  
  When the system shuts down (level 6), Apache should be stopped as one
  of the first processes, therefore I've used C<K16>. Again if the server
  does some cleanup processing during the shutdown event and requires
  third party services to be running (e.g. SQL server) it should be
  stopped before these services.
  
  Notice that it's normal for more than one symbolic link to have the
  same sequence number.
  
  Under RedHat Linux and similar systems, when a machine is booted and
  its runlevel set to 3 (multiuser + network), Linux goes into
  I</etc/rc.d/rc3.d/> and executes the scripts the symbolic links point
  to with the C<start> argument.  When it sees I<S91httpd_perl>, it
  executes:
  
    /etc/rc.d/init.d/httpd_perl start
  
  When the machine is shut down, the scripts are executed through links
  from the I</etc/rc.d/rc6.d/> directory.  This time the scripts are
  called with the C<stop> argument, like this:
  
    /etc/rc.d/init.d/httpd_perl stop
  
  Most systems have GUI utilities to automate the creation of symbolic
  links.  For example RedHat Linux includes the C<control-panel>
  utility, which amongst other things includes the C<RunLevel Manager>.
  (which can be invoked directly as either ntsysv(8) or tksysv(8)).
  This will help you to create the proper symbolic links.  Of course
  before you use it, you should put C<apachectl> or similar scripts into
  the I<init.d> or equivalent directory. Or you can have a symbolic link
  to some other location instead.
  
  The simplest approach is to use the chkconfig(8) utility which adds
  and removes the services for you. The following example shows how to
  add an I<httpd_perl> startup script to the system.
  
  First move or copy the file into the directory I</etc/rc.d/init.d>:
  
    % mv httpd_perl /etc/rc.d/init.d
  
  Now open the script in your favorite editor and add the following
  lines after the main header of the script:
  
    # Comments to support chkconfig on RedHat Linux
    # chkconfig: 2345 91 16
    # description: mod_perl enabled Apache Server
  
  So now the beginning of the script looks like:
  
    #!/bin/sh
    #
    # Apache control script designed to allow an easy command line
    # interface to controlling Apache.  Written by Marc Slemko,
    # 1997/08/23
    
    # Comments to support chkconfig on RedHat Linux
    # chkconfig: 2345 91 16
    # description: mod_perl enabled Apache Server
    
    #
    # The exit codes returned are:
    # ...
  
  Adjust the line:
  
    # chkconfig: 2345 91 16
  
  to your needs. The above setting says to says that the script should
  be started in levels 2, 3, 4, and 5, that its start priority should be
  91, and that its stop priority should be 16.
  
  Now all you have to do is to ask C<chkconfig> to configure the startup
  scripts. Before we do that let's look at what we have:
  
    % find /etc/rc.d | grep httpd_perl
    
    /etc/rc.d/init.d/httpd_perl
  
  Which means that we only have the startup script itself. Now we
  execute:
  
    % chkconfig --add httpd_perl
  
  and see what has changed:
  
    % find /etc/rc.d | grep httpd_perl
    
    /etc/rc.d/init.d/httpd_perl
    /etc/rc.d/rc0.d/K16httpd_perl
    /etc/rc.d/rc1.d/K16httpd_perl
    /etc/rc.d/rc2.d/S91httpd_perl
    /etc/rc.d/rc3.d/S91httpd_perl
    /etc/rc.d/rc4.d/S91httpd_perl
    /etc/rc.d/rc5.d/S91httpd_perl
    /etc/rc.d/rc6.d/K16httpd_perl
  
  As you can see C<chkconfig> created all the symbolic links for us,
  using the startup and shutdown priorities as specified in the line:
  
    # chkconfig: 2345 91 16
  
  If for some reason you want to remove the service from the startup
  scripts, all you have to do is to tell C<chkconfig> to remove the
  links:
  
    % chkconfig --del httpd_perl
  
  Now if we look at the files under the directory I</etc/rc.d/> we see
  again only the script itself.
  
    % find /etc/rc.d | grep httpd_perl
    
    /etc/rc.d/init.d/httpd_perl
  
  Of course you may keep the startup script in any other directory as
  long as you can link to it. For example if you want to keep this file
  with all the Apache binaries in I</usr/local/apache/bin>, all you have
  to do is to provide a symbolic link to this file:
  
    % ln -s /usr/local/apache/bin/apachectl /etc/rc.d/init.d/httpd_perl
  
  and then:
  
    %  chkconfig --add httpd_perl
  
  Note that in case of using symlinks the link name in
  I</etc/rc.d/init.d> is what matters and not the name of the script the
  link points to.
  
  =head1 Monitoring the Server.  A watchdog.
  
  With mod_perl many things can happen to your server.  It is possible
  that the server might die when you are not around.  As with any other
  critical service you need to run some kind of watchdog.
  
  One simple solution is to use a slightly modified C<apachectl> script,
  which I've named I<apache.watchdog>.  Call it from the crontab every
  30 minutes -- or even every minute -- to make sure the server is up
  all the time.
  
  The crontab entry for 30 minutes intervals:
  
    0,30 * * * * /path/to/the/apache.watchdog >/dev/null 2>&1
  
  The script:
  
    #!/bin/sh
      
    # this script is a watchdog to see whether the server is online
    # It tries to restart the server, and if it's
    # down it sends an email alert to admin 
    
    # admin's email
    [EMAIL PROTECTED]
      
    # the path to your PID file
    PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid
      
    # the path to your httpd binary, including options if necessary
    HTTPD=/usr/local/sbin/httpd_perl/httpd_perl
          
    # check for pidfile
    if [ -f $PIDFILE ] ; then
      PID=`cat $PIDFILE`
      
      if kill -0 $PID; then
        STATUS="httpd (pid $PID) running"
        RUNNING=1
      else
        STATUS="httpd (pid $PID?) not running"
        RUNNING=0
      fi
    else
      STATUS="httpd (no pid file) not running"
      RUNNING=0
    fi
        
    if [ $RUNNING -eq 0 ]; then
      echo "$0 $ARG: httpd not running, trying to start"
      if $HTTPD ; then
        echo "$0 $ARG: httpd started"
        mail $EMAIL -s "$0 $ARG: httpd started" > /dev/null 2>&1
      else
        echo "$0 $ARG: httpd could not be started"
        mail $EMAIL -s \
        "$0 $ARG: httpd could not be started" > /dev/null 2>&1
      fi
    fi
  
  Another approach, probably even more practical, is to use the cool
  C<LWP> Perl package to test the server by trying to fetch some
  document (script) served by the server.  Why is it more practical?
  Because while the server can be up as a process, it can be stuck and
  not working.  Failing to get the document will trigger restart, and
  "probably" the problem will go away. 
  
  Like before we set a cronjob to call this script every few minutes to
  fetch some very light script.  The best thing of course is to call it
  every minute.  Why so often?  If your server starts to spin and trash
  your disk space with multiple error messages filling the I<error_log>,
  in five minutes you might run out of free disk space which might bring
  your system to its knees.  Chances are that no other child will be
  able to serve requests, since the system will be too busy writing to
  the I<error_log> file.  Think big--if you are running a heavy service
  (which is very fast since you are running under mod_perl) adding one
  more request every minute will not be felt by the server at all.
  
  So we end up with a crontab entry like this:
  
    * * * * * /path/to/the/watchdog.pl >/dev/null 2>&1
  
  And the watchdog itself:
  
    #!/usr/bin/perl -wT
    
    # untaint
    $ENV{'PATH'} = '/bin:/usr/bin';
    delete @ENV{'IFS', 'CDPATH', 'ENV', 'BASH_ENV'};
    
    use strict;
    use diagnostics;
    use URI::URL;
    use LWP::MediaTypes qw(media_suffix);
    
    my $VERSION = '0.01';
    use vars qw($ua $proxy);
    $proxy = '';    
    
    require LWP::UserAgent;
    use HTTP::Status;
    
    ###### Config ########
    my $test_script_url = 'http://www.example.com:81/perl/test.pl';
    my $monitor_email   = '[EMAIL PROTECTED]';
    my $restart_command = '/usr/local/sbin/httpd_perl/apachectl restart';
    my $mail_program    = '/usr/lib/sendmail -t -n';
    ######################
    
    $ua  = new LWP::UserAgent;
    $ua->agent("$0/watchdog " . $ua->agent);
    # Uncomment the proxy if you access a machine from behind a firewall
    # $proxy = "http://www-proxy.com";;
    $ua->proxy('http', $proxy) if $proxy;
    
    # If it returns '1' it means we are alive
    exit 1 if checkurl($test_script_url);
    
    # Houston, we have a problem.
    # The server seems to be down, try to restart it. 
    my $status = system $restart_command;
    
    my $message = ($status == 0) 
                ? "Server was down and successfully restarted!" 
                : "Server is down. Can't restart.";
      
    my $subject = ($status == 0) 
                ? "Attention! Webserver restarted"
                : "Attention! Webserver is down. can't restart";
    
    # email the monitoring person
    my $to = $monitor_email;
    my $from = $monitor_email;
    send_mail($from,$to,$subject,$message);
    
    # input:  URL to check 
    # output: 1 for success, 0 for failure
    #######################  
    sub checkurl{
      my ($url) = @_;
    
      # Fetch document 
      my $res = $ua->request(HTTP::Request->new(GET => $url));
    
      # Check the result status
      return 1 if is_success($res->code);
    
      # failed
      return 0;
    } #  end of sub checkurl
    
    # send email about the problem 
    #######################  
    sub send_mail{
      my($from,$to,$subject,$messagebody) = @_;
    
      open MAIL, "|$mail_program"
          or die "Can't open a pipe to a $mail_program :$!\n";
     
      print MAIL <<__END_OF_MAIL__;
    To: $to
    From: $from
    Subject: $subject
    
    $messagebody
    
    __END_OF_MAIL__
    
      close MAIL;
    } 
  
  =head1 Running a Server in Single Process Mode
  
  Often while developing new code, you will want to run the server in
  single process mode.  See L<Sometimes it works Sometimes it does
  Not|guide::porting/Sometimes_it_Works__Sometimes_it_Doesn_t> and 
  L<Names collisions with Modules and
  libs|guide::porting/Name_collisions_with_Modules_and_libs>.  Running in
  single process mode inhibits the server from "daemonizing", and this
  allows you to run it under the control of a debugger more easily.
  
    % /usr/local/sbin/httpd_perl/httpd_perl -X
  
  When you use the C<-X> switch the server will run in the foreground of
  the shell, so you can kill it with I<Ctrl-C>.
  
  Note that in C<-X> (single-process) mode the server will run very
  slowly when fetching images.
  
  Note for Netscape users:
  
  If you use Netscape while your server is running in single-process
  mode, HTTP's C<KeepAlive> feature gets in the way.  Netscape tries to
  open multiple connections and keep them open.  Because there is only
  one server process listening, each connection has to time out before
  the next succeeds.  Turn off C<KeepAlive> in I<httpd.conf> to avoid
  this effect while developing.  If you use the image size parameters,
  Netscape will be able to render the page without the images so you can
  press the browser's I<STOP> button after a few seconds.
  
  In addition you should know that when running with C<-X> you will not
  see the control messages that the parent server normally writes to the
  I<error_log> (I<"server started">, I<"server stopped"> etc).  Since
  C<httpd -X> causes the server to handle all requests itself, without
  forking any children, there is no controlling parent to write the
  status messages.
  
  =head1 Starting a Personal Server for Each Developer
  
  If you are the only developer working on the specific server:port you
  have no problems, since you have complete control over the server.
  However, often you will have a group of developers who need to develop
  mod_perl scripts and modules concurrently.  This means that each
  developer will want to have control over the server - to kill it, to
  run it in single server mode, to restart it, etc., as well as having
  control over the location of the log files, configuration settings
  like C<MaxClients>, and so on.
  
  You I<can> work around this problem by preparing a few I<httpd.conf>
  files and forcing each developer to use
  
    httpd_perl -f /path/to/httpd.conf  
  
  but I approach it in a different way.  I use the C<-Dparameter>
  startup option of the server.  I call my version of the server
  
    % http_perl -Dstas
  
  In I<httpd.conf> I write:
  
    # Personal development Server for stas
    # stas uses the server running on port 8000
    <IfDefine stas>
    Port 8000
    PidFile /usr/local/var/httpd_perl/run/httpd.pid.stas
    ErrorLog /usr/local/var/httpd_perl/logs/error_log.stas
    Timeout 300
    KeepAlive On
    MinSpareServers 2
    MaxSpareServers 2
    StartServers 1
    MaxClients 3
    MaxRequestsPerChild 15
    </IfDefine>
    
    # Personal development Server for userfoo
    # userfoo uses the server running on port 8001
    <IfDefine userfoo>
    Port 8001
    PidFile /usr/local/var/httpd_perl/run/httpd.pid.userfoo
    ErrorLog /usr/local/var/httpd_perl/logs/error_log.userfoo
    Timeout 300
    KeepAlive Off
    MinSpareServers 1
    MaxSpareServers 2
    StartServers 1
    MaxClients 5
    MaxRequestsPerChild 0
    </IfDefine>
  
  With this technique we have achieved full control over start/stop,
  number of children, a separate error log file, and port selection for
  each server.  This saves Stas from getting called every few minutes by
  Eric: "Stas, I'm going to restart the server".
  
  In the above technique, you need to discover the PID of your parent
  C<httpd_perl> process, which is written in
  C</usr/local/var/httpd_perl/run/httpd.pid.stas> (and the same for the
  user I<eric>).  To make things even easier we change the I<apachectl>
  script to do the work for us.  We make a copy for each developer
  called B<apachectl.username> and we change two lines in each script:
  
    PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid.username
    HTTPD='/usr/local/sbin/httpd_perl/httpd_perl -Dusername'
  
  So for the user I<stas> we prepare a startup script called
  I<apachectl.stas> and we change these two lines in the standard
  apachectl script as it comes unmodified from Apache distribution.
  
    PIDFILE=/usr/local/var/httpd_perl/run/httpd.pid.stas
    HTTPD='/usr/local/sbin/httpd_perl/httpd_perl -Dstas'
  
  So now when user I<stas> wants to stop the server he will execute:
  
    apachectl.stas stop
  
  And to start:
  
    apachectl.stas start
  
  Certainly the rest of the C<apachectl> arguments apply as before.
  
  You might think about having only one C<apachectl> and know who is
  calling by checking the UID, but since you have to be root to start
  the server it is not possible, unless you make the setuid bit on this
  script, as we've explained in the beginning of this chapter. If you do
  so, you can have a single C<apachectl> script for all developers,
  after you modify it to automatically find out the UID of the user, who
  executes the script and set the right paths.
  
  The last thing is to provide developers with an option to run in
  single process mode by:
  
    /usr/local/sbin/httpd_perl/httpd_perl -Dstas -X
  
  In addition to making life easier, we decided to use relative links
  everywhere in the static documents, including the calls to CGIs.  You
  may ask how using relative links will get to the right server port.
  It's very simple, we use C<mod_rewrite>.
  
  To use mod_rewrite you have to configure your I<httpd_docs> server
  with C<--enable-module=rewrite> and recompile, or use DSO and load the
  module in I<httpd.conf>.  In the I<httpd.conf> of our C<httpd_docs>
  server we have the following code:
  
    RewriteEngine on
    
    # stas's server
    # port = 8000
    RewriteCond  %{REQUEST_URI} ^/(perl|cgi-perl)
    RewriteCond  %{REMOTE_ADDR} 123.34.45.56
    RewriteRule ^(.*)           http://example.com:8000/$1 [P,L]
    
    # eric's server
    # port = 8001
    RewriteCond  %{REQUEST_URI} ^/(perl|cgi-perl)
    RewriteCond  %{REMOTE_ADDR} 123.34.45.57
    RewriteRule ^(.*)           http://example.com:8001/$1 [P,L]
    
    # all the rest
    RewriteCond  %{REQUEST_URI} ^/(perl|cgi-perl)
    RewriteRule ^(.*)           http://example.com:81/$1 [P]
  
  The IP addresses are the addresses of the developer desktop machines
  (where they are running their web browsers).  So if an html file
  includes a relative URI I</perl/test.pl> or even
  I<http://www.example.com/perl/test.pl>, clicking on the link will be
  internally proxied to http://www.example.com:8000/perl/test.pl if the
  click has been made at the user I<stas>'s desktop machine, or to
  I<http://www.example.com:8001/perl/test.pl> for a request generated
  from the user I<eric>'s machine, per our above URI rewrite example.
  
  Another possibility is to use C<REMOTE_USER> variable if all the
  developers are forced to authenticate themselves before they can
  access the server. If you do, you will have to change the
  C<RewriteRule>s to match C<REMOTE_USER> in the above example.
  
  We wish to stress again, that the above setup will work only with
  relative URIs in the HTML code. If you choose to generate full URIs
  including non-80 port the requests originated from this HTML code will
  bypass the light server listening to the default port 80, and go
  directly to the I<server:port> of the full URI.
  
  =head1 Wrapper to Emulate the Server Perl Environment
  
  Often you will start off debugging your script by running it from your
  favorite shell program.  Sometimes you encounter a very weird
  situation when the script runs from the shell but dies when processed
  as a CGI script by a web-server.  The real problem often lies in the
  difference between the environment variables that is used by your
  web-server and the ones used by your shell program.
  
  For example you may have a set of non-standard Perl directories, used
  for local Perl modules. You have to tell the Perl interpreter where
  these directories are. If you don't want to modify C<@INC> in all
  scripts and modules, you can use a C<PERL5LIB> environment variable,
  to tell Perl where the directories are. But then you might forget to
  alter the mod_perl startup script to correct C<@INC> there as
  well. And if you forget this, you can be quite puzzled why the scripts
  are running from the shell program, but not from the web. 
  
  Of course the I<error_log> will help as well to find out what the
  problem is, but there can be other obscure cases, where you do
  something different at the shell program and your scripts refuse to
  run under the web-server.
  
  Another example is when you have more than one version of Perl
  installed. You might call the first version of the Perl executable in
  the first script's line (the shebang line), but to have the web-server
  compiled with another Perl version. Since mod_perl ignores the path to
  the Perl executable at the first line of the script, you can get quite
  confused the code won't do the same when processed as request,
  compared to be executed from the command line. it will take a while
  before you realize that you test the scripts from the shell program
  using the I<wrong> Perl version.
  
  The best debugging approach is to write a wrapper that emulates the
  exact environment of the server, first deleting environment variables
  like C<PERL5LIB> and then calling the same perl binary that it is
  being used by the server.  Next, set the environment identical to the
  server's by copying the Perl run directives from the server startup
  and configuration files or even I<require()>'ing the startup file, if
  it doesn't include C<Apache::> modules stuff, unavailable under shell.
  This will also allow you to remove completely the first line of the
  script, since mod_perl doesn't need it anyway and the wrapper knows
  how to call the script.
  
  Here is an example of such a script.  Note that we force the use of
  C<-Tw> when we call the real script. Since when debugging we want to
  make sure that the code is working when the taint mode is on, and we
  want to see all the warnings, to help Perl help us have a better code.
  
  We have also added the ability to pass parameters, which will not
  happen when you will issue a request to script, but it can be helpful
  at times.
  
    #!/usr/bin/perl -w
     
    # This is a wrapper example
     
    # It simulates the web server environment by setting @INC and other
    # stuff, so what will run under this wrapper will run under Web and
    # vice versa. 
    
    #
    # Usage: wrap.pl some_cgi.pl
    #
    BEGIN {
      # we want to make a complete emulation, so we must reset all the
      # paths and add the standard Perl libs
      @INC =
        qw(/usr/lib/perl5/5.00503/i386-linux
       /usr/lib/perl5/5.00503
       /usr/lib/perl5/site_perl/5.005/i386-linux
       /usr/lib/perl5/site_perl/5.005
       .
      );
    }
    
    use strict;
    use File::Basename;
    
      # process the passed params
    my $cgi = shift || '';
    my $params = (@ARGV) ? join(" ", @ARGV) : '';
    
    die "Usage:\n\t$0 some_cgi.pl\n" unless $cgi;
    
      # Set the environment
    my $PERL5LIB = join ":", @INC;
    
      # if the path includes the directory 
      # we extract it and chdir there
    if (index($cgi,'/') >= 0) {
      my $dirname = dirname($cgi);
      chdir $dirname or die "Can't chdir to $dirname: $! \n";
      $cgi =~ m|$dirname/(.*)|;
      $cgi = $1;
    }
    
      # run the cgi from the script's directory
      # Note that we set Warning and Taint modes ON!!!
    system qq{/usr/bin/perl -I$PERL5LIB -Tw $cgi $params};
  
  =head1 Server Maintenance Chores
  
  It's not enough to have your server and service up and running.  You
  have to maintain the server even when everything seems to be
  fine. This includes security auditing, keeping an eye on the size of
  remaining unused disk space, available RAM, the load of the system,
  etc.
  
  If you forget about these chores one day (sooner or later) your system
  will crash either because it has run out of free disk space, all the
  available CPU has been used and system has started heavily to swap or
  someone has broken in. Unfortunately the scope of this guide is not
  covering the latter, since it will take more than one book to
  profoundly cover this issue, but the rest of the thing are quite easy
  to prevent if you follow our advices.
  
  Certainly, your particular system might have maintenance chores that
  aren't covered here, but at least you will be alerted that these
  chores are real and should be taken care of.
  
  =head2 Handling Log Files
  
  There are two issues to solve with log files. First they should be
  rotated and compressed on the constant basis, since they tend to use
  big parts of the disk space over time. Second these should be
  monitored for possible sudden explosive growth rates, when something
  goes astray in your code running at the mod_perl server and the
  process starts to log thousands of error messages in second without
  stopping, until all the disk space is used, and the server cannot
  work anymore.
  
  =head3 Log Rotation
  
  The first issue is solved by having a process run by crontab at
  certain times (usually off hours, if this term is still valid in the
  Internet era) and rotate the logs. The log rotation includes the
  current log file renaming, server restart (which creates a fresh new
  log file), and renamed file compression and/or moving it on a
  different disk.
  
  For example if we want to rotate the I<access_log> file we could do:
  
    % mv access_log access_log.renamed
    % apachectl restart
    % sleep 5; # allow all children to complete requests and logging
               # now it's safe to use access_log.renamed
    % mv access_log.renamed /some/directory/on/another/disk
  
  This is the script that we run from the crontab to rotate the log
  files:
  
    #!/usr/local/bin/perl -Tw
    
    # This script does log rotation. Called from crontab.
    
    use strict;
    $ENV{PATH}='/bin:/usr/bin';
    
    ### configuration
    my @logfiles = qw(access_log error_log);
    umask 0;
    my $server = "httpd_perl";
    my $logs_dir = "/usr/local/var/$server/logs";
    my $restart_command = "/usr/local/sbin/$server/apachectl restart";
    my $gzip_exec = "/usr/bin/gzip";
    
    my ($sec,$min,$hour,$mday,$mon,$year) = localtime(time);
    my $time = sprintf "%0.4d.%0.2d.%0.2d-%0.2d.%0.2d.%0.2d",
         $year+1900,++$mon,$mday,$hour,$min,$sec;
    $^I = ".$time";
    
    # rename log files
    chdir $logs_dir;
    @ARGV = @logfiles;
    while (<>) {
      close ARGV;
    }
    
    # now restart the server so the logs will be restarted
    system $restart_command;
    
    # allow all children to complete requests and logging
    sleep 5;
  
    # compress log files
    foreach (@logfiles) {
        system "$gzip_exec $_.$time";
    }
  
  Note: Setting C<$^I> sets the in-place edit flag to a dot followed by
  the time.  We copy the names of the logfiles into C<@ARGV>, and open
  each in turn and immediately close them without doing any changes; but
  because the in-place edit flag is set they are effectively renamed.
  
  As you see the rotated files will include the date and the time in
  their filenames.
  
  Here is a more generic set of scripts for log rotation.  Cron job
  fires off setuid script called log-roller that looks like this:
  
    #!/usr/bin/perl -Tw
    use strict;
    use File::Basename;
    
    $ENV{PATH} = "/usr/ucb:/bin:/usr/bin";
    
    my $ROOT = "/WWW/apache"; # names are relative to this
    my $CONF = "$ROOT/conf/httpd.conf"; # master conf
    my $MIDNIGHT = "MIDNIGHT";  # name of program in each logdir
    
    my ($user_id, $group_id, $pidfile); # will be set during parse of conf
    die "not running as root" if $>;
    
    chdir $ROOT or die "Cannot chdir $ROOT: $!";
    
    my %midnights;
    open CONF, "<$CONF" or die "Cannot open $CONF: $!";
    while (<CONF>) {
      if (/^User (\w+)/i) {
        $user_id = getpwnam($1);
        next;
      }
      if (/^Group (\w+)/i) {
        $group_id = getgrnam($1);
        next;
      }
      if (/^PidFile (.*)/i) {
        $pidfile = $1;
        next;
      }
     next unless /^ErrorLog (.*)/i;
      my $midnight = (dirname $1)."/$MIDNIGHT";
      next unless -x $midnight;
      $midnights{$midnight}++;
    }
    close CONF;
    
    die "missing User definition" unless defined $user_id;
    die "missing Group definition" unless defined $group_id;
    die "missing PidFile definition" unless defined $pidfile;
    
    open PID, $pidfile or die "Cannot open $pidfile: $!";
    <PID> =~ /(\d+)/;
    my $httpd_pid = $1;
    close PID;
    die "missing pid definition" unless defined $httpd_pid and $httpd_pid;
    kill 0, $httpd_pid or die "cannot find pid $httpd_pid: $!";
    
    
    for (sort keys %midnights) {
      defined(my $pid = fork) or die "cannot fork: $!";
      if ($pid) {
        ## parent:
        waitpid $pid, 0;
      } else {
        my $dir = dirname $_;
        ($(,$)) = ($group_id,$group_id);
        ($<,$>) = ($user_id,$user_id);
        chdir $dir or die "cannot chdir $dir: $!";
        exec "./$MIDNIGHT";
        die "cannot exec $MIDNIGHT: $!";
      }
    }
    
    kill 1, $httpd_pid or die "Cannot SIGHUP $httpd_pid: $!";
  
  And then individual C<MIDNIGHT> scripts can look like this:
  
    #!/usr/bin/perl -Tw
    use strict;
    
    die "bad guy" unless getpwuid($<) =~ /^(root|nobody)$/;
    my @LOGFILES = qw(access_log error_log);
    umask 0;
    $^I = ".".time;
    @ARGV = @LOGFILES;
    while (<>) {
      close ARGV;
    }
  
  Can you spot the security holes? Take your time...  This code
  shouldn't be used in hostile situations.
  
  =head3 Non-Scheduled Emergency Log Rotation
  
  As we have mentioned before, there are times when the web server goes
  wild and starts to log lots of messages to the I<error_log> file
  non-stop.  If no one monitors this, it possible that in a few minutes
  all the free disk spaces will be filled and no process will be able to
  work normally. When this happens, the I/O the faulty server causes is
  so heavy that its sibling processes cannot serve requests.
  
  Generally this not the case, but a few people have reported to
  encounter this problem.  If you are one of these people, you should
  run the monitoring program that checks the log file size and if it
  notices that the file has grown too large, it should attempt to
  restart the server and probably trim the log file.
  
  When we have used a quite old mod_perl version, sometimes we have had
  bursts of an error I<Callback called exit> showing up in our
  I<error_log>.  The file could grow to 300 Mbytes in a few minutes.
  
  We will show you is an example of the script that should be executed
  from the crontab, to handle the situations like this.  The cron job
  should run every few minutes or even every minute, since if you
  experience this problem you know that log files fills up very fast.
  The example script will rotate when the I<error_log> will grow over
  100K. Note that this script is useful when you have the normal
  scheduled log rotation facility working, remember that this one is an
  emergency solver and not to be used for routine log rotation.
  
    emergency_rotate.sh
    -------------------
    #!/bin/sh
    S=`ls -s /usr/local/apache/logs/error_log | awk '{print $1}'`
    if [ "$S" -gt 100000 ] ; then
      mv /usr/local/apache/logs/error_log /usr/local/apache/logs/error_log.old
      /etc/rc.d/init.d/httpd restart
      date | /bin/mail -s "error_log $S kB on inx" [EMAIL PROTECTED]
    fi
  
  Of course you could write a more advanced script, using the timestamps
  and other whistles. This example comes to illustrate how to solve the
  problem in question.
  
  Another solution is to use an out of box tools that are written for
  this purpose. The C<daemontools> package
  (ftp://koobera.math.uic.edu/www/daemontools.html) includes a utility
  called C<multilog>.  This utility saves stdin stream to one or more
  log files. It optionally timestamps each line and, for each log,
  includes or excludes lines matching specified patterns. It
  automatically rotates logs to limit the amount of disk space used. If
  the disk fills up, it pauses and tries again, without losing any data.
  
  The obvious caveat is that it doesn't restart the server, so while it
  tries to solve the log file handling problem it doesn't handle the
  originator of the problem. But since the I/O of the log writing
  process Apache process will be quite heavy, the rest of the servers
  will work very slowly if at all, and a normal watchdog should detect
  this abnormal situation and restart the Apache server.
  
  =head1 Swapping Prevention
  
  Before I delve into swapping process details, let's refresh our
  knowledge of memory components and memory management
  
  The computer memory is called RAM, which stands for Random Access
  Memory.  Reading and writing to RAM is, by a few orders, faster than
  doing the same operations on a hard disk, the former uses non-movable
  memory cells, while the latter uses rotating magnetic media.
  
  On most operating systems swap memory is used as an extension for RAM
  and not as a duplication of it. So if your OS is one of those, if you
  have 128MB of RAM and 256MB swap partition, you have a total of 384MB
  of memory available. You should never count the extra memory when you
  decide on the maximum number of processes to be run, and I will show
  why in the moment.
  
  The swapping memory can be built of a number of hard disk partitions
  and swap files formatted to be used as swap memory. When you need more
  swap memory you can always extend it on demand as long as you have
  some free disk space (for more information see the I<mkswap> and
  I<swapon> manpages).
  
  System memory is quantified in units called memory pages. Usually the
  size of a memory page is between 1KB and 8KB.  So if you have 256MB of
  RAM installed on your machine and the page size is 4KB your system has
  64,000 main memory pages to work with and these pages are fast.  If
  you have 256MB swap partition the system can use yet another 64,000
  memory pages, but they are much slower.
  
  When the system is started all memory pages are available for use by
  the programs (processes).
  
  Unless the program is really small, the process running this program
  uses only a few segments of the program, each segment mapped onto its
  own memory page. Therefore only a few memory pages are required to be
  loaded into the memory.
  
  When the process needs an additional program's segment to be loaded
  into the memory, it asks the system whether the page containing this
  segment is already loaded in the memory. If the page is not found--an
  event know as a I<page fault> occurs, which requires the system to
  allocate a free memory page, go to the disk, read and load the
  requested program's segment into the allocated memory page.
  
  If a process needs to bring a new page into physical memory and there
  are no free physical pages available, the operating system must make
  room for this page by discarding another page from physical memory.
  
  If the page to be discarded from physical memory came from an image or
  data file and has not been written to then the page does not need to
  be saved. Instead it can be discarded and if the process needs that
  page again it can be brought back into memory from the image or data
  file.
  
  However, if the page has been modified, the operating system must
  preserve the contents of that page so that it can be accessed at a
  later time. This type of page is known as a I<dirty page> and when it
  is removed from memory it is saved in a special sort of file called
  the swap file. This process is referred to as a I<swapping out>.
  
  Accesses to the swap file are very long relative to the speed of the
  processor and physical memory and the operating system must juggle the
  need to write pages to disk with the need to retain them in memory to
  be used again.
  
  In order to improve the swapping out process, to decrease the
  possibility that the page that has just been swapped out, will be
  needed at the next moment, the LRU (least recently used) or a similar
  algorithm is used.
  
  To summarize the two swapping scenarios, read-only pages discarding
  incurs no overhead in contrast with the discarding scenario of the
  data pages that have been written to, since in the latter case the
  pages have to be written to a swap partition located on the slow disk.
  Therefore your machine's overall performance will be much better if
  there will be less memory pages that can become dirty.
  
  But the problem is, Perl is a language with no strong data types,
  which means that both the program code and the program data are seen
  as a data pages by OS since both mapped to the same memory
  pages. Therefore a big chunk of your Perl code becomes dirty when its
  variables are modified and when the pages need to be discarded they
  have to be written to the swap partition.
  
  This leads us to two important conclusions about swapping and Perl.
  
  =over 
  
  =item *
  
  Running your system when there is no free main memory available
  hinders performance, because processes memory pages should be
  discarded and then reread from disk again and again.
  
  =item *
  
  Since a majority of the running code is a Perl code, in addition to
  the overhead of reading the previously discarded pages in, the
  overhead of saving the dirty pages to the swap partition is occurring.
  
  =back
  
  When the system has to swap memory pages in and out, the system slows
  down, not serving the processes as fast as before. This leads to an
  accumulation of processes waiting for their turn to run, which further
  causes processing demands to go up, which in turn slows down the
  system even more as more memory is required.  This ever worsening
  spiral will lead the machine to halt, unless the resource demand
  suddenly drops down and allows the processes to catch up with their
  tasks and go back to normal memory usage.
  
  In addition it's important to know that for a better performance, most
  programs, particularly programs written in Perl, on most modern OSs
  don't return memory pages while they are running. If some of the
  memory gets freed it's reused when needed by the process, without
  creating the additional overhead of asking the system to allocate new
  memory pages.  That's why you will observe that Perl programs grow in
  size as they run and almost never shrink.
  
  When the process quits it returns its memory pages to the pool of
  freely available pages for other processes to use.
  
  This scenario is certainly educating, and it should be now obvious
  that your system that runs the web server should never swap. It's
  absolutely normal for your desktop to start swapping. You will see it
  immediately since things will slow down and sometimes the system will
  freeze for a short periods. But as I've just mentioned, you can stop
  starting new programs and can quit some, thus allowing the system to
  catch up with the load and come back to use the RAM.
  
  In the case of the web server you have much less control since it's
  users who load your machine by issuing requests to your server.
  Therefore you should configure the server, so that the maximum number
  of possible processes will be small enough using the C<MaxClients>
  directive (For the technique for choosing the right C<MaxClients>
  refer to the section 'L<Choosing
  MaxClients|guide::performance/Choosing_MaxClients>'). This will ensure that 
  at peak hours the system won't swap. Remember that swap space is an
  emergency pool, not a resource to be used routinely.  If you are low
  on memory and you badly need it, buy it or reduce the number of
  processes to prevent swapping.
  
  However sometimes, due to the faulty code, some process might start
  spinning in an unconstrained loop, consuming all the available RAM and
  starting to heavily use swap memory. In such a situation it helps when
  you have a big emergency pool (i.e. lots of swap memory). But you have
  to resolve this problem as soon as possible since this pool won't last
  for a long time. In the meanwhile the C<Apache::Resource> module can
  be handy.
  
  For swapping monitoring techniques see the section 'L<Apache::VMonitor
  -- Visual System and Apache Server
  
Monitor|guide::debug/Apache__VMonitor____Visual_System_and_Apache_Server_Monitor>'.
  
  =head1 Preventing mod_perl Processes From Going Wild
  
  Sometimes people report that they had a problem with their code
  running under mod_perl that has caused all the RAM or all the disk to
  be used. The following tips should help you prevent these problems,
  before if at all they hit you.
  
  =head2 All RAM Consumed
  
  Sometimes calling an undefined subroutine in a module can cause a
  tight loop that consumes all the available memory.  Here is a way to
  catch such errors.  Define an C<UNIVERSAL::AUTOLOAD> subroutine in
  your I<startup.pl>, or in a E<lt>PerlE<gt>E<lt>/PerlE<gt> section in
  your I<httpd.conf> file:
  
    sub UNIVERSAL::AUTOLOAD {
      my $class = shift;
      warn "$class can't \$UNIVERSAL::AUTOLOAD=$UNIVERSAL::AUTOLOAD!\n";
    }
  
  You can either put it in your startup.pl, or in a
  C<E<lt>PerlE<gt>E<lt>/PerlE<gt>> section in your httpd.conf file.  I
  do the latter.  Putting it in all your mod_perl modules would be
  redundant (and might give you compile-time errors).
  
  This will produce a nice error in I<error_log>, giving the line number
  of the call and the name of the undefined subroutine.
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  
  =head1 Authors
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  Only the major authors are listed above. For contributors see the
  Changes file.
  
  
  =cut
  
  
  
  
  1.1                  modperl-docs/src/docs/general/hardware.pod
  
  Index: hardware.pod
  ===================================================================
  =head1 NAME
  
  Choosing an Operating System and Hardware
  
  =head1 Description
  
  Before you use the techniques documented on this site to tune servers
  and write code you need to consider the demands which will be placed on
  the hardware and the operating system.  There is no point in investing
  a lot of time and money in configuration and coding only to find that
  your server's performance is poor because you did not choose a
  suitable platform in the first place.
  
  While the tips below could apply to many web servers, they are aimed
  primarily at administrators of mod_perl enabled Apache server.
  
  Because hardware platforms and operating systems are developing
  rapidly (even while you are reading this document), this discussion must
  be in general terms.
  
  =head1 Choosing an Operating System
  
  First let's talk about Operating Systems (OSs).
  
  Most of the time I prefer to use Linux or something from the *BSD
  family.  Although I am personally a Linux devotee, I do not want to
  start yet another OS war.
  
  I will try to talk about what characteristics and features you should
  be looking for to support an Apache/mod_perl server, then when you
  know what you want from your OS, you can go out and find it.  Visit
  the Web sites of the operating systems you are interested in.  You can
  gauge user's opinions by searching the relevant discussions in
  newsgroups and mailing list archives.  Deja - http://deja.com and
  eGroups - http://egroups.com are good examples.  I will leave this fan
  research to the reader.
  
  =head2 Stability and Robustness
  
  Probably the most important features in an OS are stability and
  robustness.  You are in an Internet business.  You do not keep normal
  9am to 5pm working hours like many conventional businesses you know.
  You are open 24 hours a day.  You cannot afford to be off-line, for
  your customers will go shop at another service like yours (unless you
  have a monopoly :).  If the OS of your choice crashes every day, first
  do a little investigation.  There might be a simple reason which you
  can find and fix.  There are OSs which won't work unless you reboot
  them twice a day.  You don't want to use the OS of this kind, no
  matter how good the OS' vendor sales department.  Do not follow flushy
  advertisements, follow developers advices instead.
  
  Generally, people who have used the OS for some time can tell you a
  lot about its stability.  Ask them.  Try to find people who are doing
  similar things to what you are planning to do, they may even be using
  the same software.  There are often compatibility issues to resolve.
  You may need to become familiar with patching and compiling your OS.
  It's easy.
  
  =head2 Memory Management
  
  You want an OS with a good memory management, some OSs are well known
  as memory hogs.  The same code can use twice as much memory on one OS
  compared to another.  If the size of the mod_perl process is 10Mb and
  you have tens of these running, it definitely adds up!
  
  =head2 Memory Leaks
  
  Some OSs and/or their libraries (e.g. C runtime libraries) suffer from
  memory leaks.  A leak is when some process requests a chunk of memory
  for temporary storage, but then does not subsequently release it.  The
  chunk of memory is not then available for any purpose until the
  process which requested it dies.  We cannot afford such leaks.  A
  single mod_perl process sometimes serves thousands of requests before
  it terminates.  So if a leak occurs on every request, the memory
  demands could become huge.  Of course our code can be the cause of the
  memory leaks as well (check out the C<Apache::Leak> module on CPAN).
  Certainly, we can reduce the number of requests to be served over the
  process' life, but that can degrade performance.
  
  =head2 Sharing Memory
  
  We want an OS with good memory sharing capabilities.  As we have seen,
  if we preload the modules and scripts at server startup, they are
  shared between the spawned children (at least for a part of a process'
  life - memory pages can become "dirty" and cease to be shared).  This
  feature can reduce memory consumption a lot!
  
  =head2 Cost and Support
  
  If we are in a big business we probably do not mind paying another
  $1000 for some fancy OS with bundled support.  But if our resources
  are low, we will look for cheaper and free OSs.  Free does not mean
  bad, it can be quite the opposite.  Free OSs can have the best support
  we can find.  Some do.  It is very easy to understand - most of the
  people are not rich and will try to use a cheaper or free OS first if
  it does the work for them.  Since it really fits their needs, many
  people keep using it and eventually know it well enough to be able to
  provide support for others in trouble.  Why would they do this for
  free?  One reason is for the spirit of the first days of the Internet,
  when there was no commercial Internet and people helped each other,
  because someone helped them in first place.  I was there, I was
  touched by that spirit and I am keen to keep that spirit alive.
  
  But, let's get back to our world.  We are living in material world,
  and our bosses pay us to keep the systems running.  So if you feel
  that you cannot provide the support yourself and you do not trust the
  available free resources, you must pay for an OS backed by a company,
  and blame them for any problem.  Your boss wants to be able to sue
  someone if the project has a problem caused by the external product
  that is being used in the project.  If you buy a product and the
  company selling it claims support, you have someone to sue or at least
  to put the blame on.
  
  If we go with Open Source and it fails we do not have someone to
  sue... wrong--in the last years many companies have realized how good
  the Open Source products are and started to provide an official
  support for these products.  So your boss cannot just dismiss your
  suggestion of using an Open Source Operating System.  You can get a
  paid support just like with any other commercial OS vendor.
  
  Also remember that the less money you spend on OS and Software, the
  more you will be able to spend on faster and stronger hardware.
  
  =head2 Discontinued Products
  
  The OSs in this hazard group tend to be developed by a single company
  or organization.
  
  You might find yourself in a position where you have invested a lot of
  time and money into developing some proprietary software that is
  bundled with the OS you chose (say writing a mod_perl handler which
  takes advantage of some proprietary features of the OS and which will
  not run on any other OS).  Things are under control, the performance
  is great and you sing with happiness on your way to work.  Then, one
  day, the company which supplies your beloved OS goes bankrupt (not
  unlikely nowadays), or they produce a newer incompatible version and
  they will not support the old one (happens all the time).  You are
  stuck with their early masterpiece, no support and no source code!
  What are you going to do?  Invest more money into porting the software
  to another OS...
  
  Everyone can be hit by this mini-disaster so it is better to check the
  background of the company when making your choice.  Even so you never
  know what will happen tomorrow - in 1980, a company called Tektronix
  did something similar to one of the Guide reviewers with its
  microprocessor development system.  The guy just had to buy another
  system.  He didn't buy it from Tektronix, of course.  The second
  system never really worked very well and the firm he bought it from
  went bust before they ever got around to fixing it.  So in 1982 he
  wrote his own microprocessor development system software.  It didn't
  take long, it works fine, and he's still using it 18 years later.
  
  Free and Open Source OSs are probably less susceptible to this kind of
  problem.  Development is usually distributed between many companies
  and developers, so if a person who developed a really important part
  of the kernel lost interest in continuing, someone else will pick the
  falling flag and carry on.  Of course if tomorrow some better project
  shows up, developers might migrate there and finally drop the
  development: but in practice people are often given support on older
  versions and helped to migrate to current versions.  Development tends
  to be more incremental than revolutionary, so upgrades are less
  traumatic, and there is usually plenty of notice of the forthcoming
  changes so that you have time to plan for them.
  
  Of course with the Open Source OSs you can have the source!  So you
  can always have a go yourself, but do not under-estimate the amounts
  of work involved.  There are many, many man-years of work in an OS.
  
  =head2 OS Releases
  
  Actively developed OSs generally try to keep pace with the latest
  technology developments, and continually optimize the kernel and other
  parts of the OS to become better and faster.  Nowadays, Internet and
  networking in general are the hottest topics for system developers.
  Sometimes a simple OS upgrade to the latest stable version can save
  you an expensive hardware upgrade.  Also, remember that when you buy
  new hardware, chances are that the latest software will make the most
  of it.
  
  If a new product supports an old one by virtue of backwards
  compatibility with previous products of the same family, you might not
  reap all the benefits of the new product's features.  Perhaps you get
  almost the same functionality for much less money if you were to buy
  an older model of the same product.
  
  =head1 Choosing Hardware
  
  Sometimes the most expensive machine is not the one which provides the
  best performance.  Your demands on the platform hardware are based on
  many aspects and affect many components.  Let's discuss some of them.
  
  In the discussion we use terms that may be unfamiliar to some readers:
  
  =over 4
  
  =item *
  
  Cluster - a group of machines connected together to perform one big or
  many small computational tasks in a reasonable time.  Clustering can
  also be used to provide 'fail-over' where if one machine fails its
  processes are transferred to another without interruption of service.
  And you may be able to take one of the machines down for maintenance
  (or an upgrade) and keep your service running - the main server will
  simply not dispatch the requests to the machine that was taken down.
  
  =item *
  
  Load balancing - users are given the name of one of your machines but
  perhaps it cannot stand the heavy load.  You can use a clustering
  approach to distribute the load over a number of machines.  The
  central server, which users access initially when they type the name
  of your service, works as a dispatcher.  It just redirects requests to
  other machines.  Sometimes the central server also collects the
  results and returns them to the users.  You can get the advantages of
  clustering too.
  
  There are many load balancing techniques. (See L<High-Availability
  Linux Project|guide::download/High_Availability_Linux_Project> for more info.)
  
  =item *
  
  NIC - Network Interface Card. A hardware component that allows to
  connect your machine to the network. It performs packets sending and
  receiving, newer cards can encrypt and decrypt packets and perform
  digital signing and verifying of the such. These are coming in
  different speeds categories varying from 10Mbps to 10Gbps and
  faster. The most used type of the NIC card is the one that implements
  the Ethernet networking protocol.
  
  =item *
  
  RAM - Random Access Memory. It's the memory that you have in your
  computer. (Comes in units of 8Mb, 16Mb, 64Mb, 256Mb, etc.)
  
  =item *
  
  RAID - Redundant Array of Inexpensive Disks.
  
  An array of physical disks, usually treated by the operating system as
  one single disk, and often forced to appear that way by the hardware.
  The reason for using RAID is often simply to achieve a high data
  transfer rate, but it may also be to get adequate disk capacity or
  high reliability.  Redundancy means that the system is capable of
  continued operation even if a disk fails.  There are various types of
  RAID array and several different approaches to implementing them.
  Some systems provide protection against failure of more than one drive
  and some (`hot-swappable') systems allow a drive to be replaced
  without even stopping the OS.  See for example the Linux `HOWTO'
  documents Disk-HOWTO, Module-HOWTO and Parallel-Processing-HOWTO.
  
  =back
  
  =head2 Machine Strength Demands According to Expected Site Traffic
  
  If you are building a fan site and you want to amaze your friends with
  a mod_perl guest book, any old 486 machine could do it.  If you are in
  a serious business, it is very important to build a scalable server.
  If your service is successful and becomes popular, the traffic could
  double every few days, and you should be ready to add more resources
  to keep up with the demand.  While we can define the webserver
  scalability more precisely, the important thing is to make sure that
  you can add more power to your webserver(s) without investing much
  additional money in software development (you will need a little
  software effort to connect your servers, if you add more of them).
  This means that you should choose hardware and OSs that can talk to
  other machines and become a part of a cluster.
  
  On the other hand if you prepare for a lot of traffic and buy a
  monster to do the work for you, what happens if your service doesn't
  prove to be as successful as you thought it would be?  Then you've
  spent too much money, and meanwhile faster processors and other
  hardware components have been released, so you lose.
  
  Wisdom and prophecy, that's all it takes :)
  
  =head3 Single Strong Machine vs Many Weaker Machines
  
  Let's start with a claim that a four years old processor is still very
  powerful and can be put to a good use. Now let's say that for a given
  amount of money you can probably buy either one new very strong
  machine or about ten older but very cheap machines. I claim that with
  ten old machines connected into a cluster and by deploying load
  balancing you will be able to serve about five times more requests
  than with one single new machine.
  
  Why is that?  Because generally the performance improvement on a new
  machine is marginal while the price is much higher.  Ten machines will
  do faster disk I/O than one single machine, even if the new disk is
  quite a bit faster.  Yes, you have more administration overhead, but
  there is a chance you will have it anyway, for in a short time the new
  machine you have just bought might not stand the load.  Then you will
  have to purchase more equipment and think about how to implement load
  balancing and web server file system distribution anyway.
  
  Why I'm so convinced?  Look at the busiest services on the Internet:
  search engines, web-email servers and the like -- most of them use a
  clustering approach.  You may not always notice it, because they hide
  the real implementation behind proxy servers.
  
  =head2 Internet Connection
  
  You have the best hardware you can get, but the service is still
  crawling.  Make sure you have a fast Internet connection.  Not as fast
  as your ISP claims it to be, but fast as it should be.  The ISP might
  have a very good connection to the Internet, but put many clients on
  the same line.  If these are heavy clients, your traffic will have to
  share the same line and your throughput will suffer.  Think about a
  dedicated connection and make sure it is truly dedicated.  Don't trust
  the ISP, check it!
  
  The idea of having a connection to B<The Internet> is a little
  misleading.  Many Web hosting and co-location companies have large
  amounts of bandwidth, but still have poor connectivity.  The public
  exchanges, such as MAE-East and MAE-West, frequently become
  overloaded, yet many ISPs depend on these exchanges.
  
  Private peering means that providers can exchange traffic much
  quicker.
  
  Also, if your Web site is of global interest, check that the ISP has
  good global connectivity.  If the Web site is going to be visited
  mostly by people in a certain country or region, your server should
  probably be located there.
  
  Bad connectivity can directly influence your machine's performance.
  Here is a story one of the developers told on the mod_perl mailing
  list:
  
    What relationship has 10% packet loss on one upstream provider got
    to do with machine memory ?
  
    Yes.. a lot. For a nightmare week, the box was located downstream of
    a provider who was struggling with some serious bandwidth problems
    of his own... people were connecting to the site via this link, and
    packet loss was such that retransmits and tcp stalls were keeping
    httpd heavies around for much longer than normal.. instead of
    blasting out the data at high or even modem speeds, they would be
    stuck at 1k/sec or stalled out...  people would press stop and
    refresh, httpds would take 300 seconds to timeout on writes to
    no-one.. it was a nightmare.  Those problems didn't go away till I
    moved the box to a place closer to some decent backbones.
  
    Note that with a proxy, this only keeps a lightweight httpd tied up,
    assuming the page is small enough to fit in the buffers.  If you are
    a busy internet site you always have some slow clients.  This is a
    difficult thing to simulate in benchmark testing, though.
  
  =head2 I/O Performance
  
  If your service is I/O bound (does a lot of read/write operations to
  disk) you need a very fast disk, especially if the you need a
  relational database, which are the main I/O stream creators.  So you
  should not spend the money on Video card and monitor!  A cheap card
  and a 14" monochrome monitor are perfectly adequate for a Web server,
  you will probably access it by C<telnet> or C<ssh> most of the time.
  Look for disks with the best price/performance ratio.  Of course, ask
  around and avoid disks that have a reputation for headcrashes and
  other disasters.
  
  You must think about RAID or similar systems if you have an enormous
  data set to serve (what is an enormous data set nowadays?  Gigabytes,
  Terabytes?) or you expect a really big web traffic.
  
  Ok, you have a fast disk, what's next?  You need a fast disk
  controller.  There may be one embedded on your computer's motherboard.
  If the controller is not fast enough you should buy a faster one.
  Don't forget that it may be necessary to disable the original
  controller.
  
  =head2 Memory
  
  Memory should be well tested.  Many memory test programs are
  practically useless.  Running a busy system for a few weeks without
  ever shutting it down is a pretty good memory test.  If you increase
  the amount of RAM on a well-tested box, use well-tested RAM.
  
  How much RAM do you need?  Nowadays, the chances are that you will
  hear: "Memory is cheap, the more you buy the better".  But how much is
  enough?  The answer is pretty straightforward: I<you do not want your
  machine to swap>.  When the CPU needs to write something into memory,
  but memory is already full, it takes the least frequently used memory
  pages and swaps them out to disk.  This means you have to bear the
  time penalty of writing the data to disk.  If another process then
  references some of the data which happens to be on one of the pages
  that has just been swapped out, the CPU swaps it back in again,
  probably swapping out some other data that will be needed very shortly
  by some other process.  Carried to the extreme, the CPU and disk start
  to I<thrash> hopelessly in circles, without getting any real work
  done.  The less RAM there is, the more often this scenario arises.
  Worse, you can exhaust swap space as well, and then your troubles
  really start...
  
  How do you make a decision?  You know the highest rate at which your
  server expects to serve pages and how long it takes on average to
  serve one.  Now you can calculate how many server processes you need.
  If you know the maximum size your servers can grow to, you know how
  much memory you need.  If your OS supports L<memory
  sharing|guide::hardware/Sharing_Memory>, you can make best use of this
  feature by preloading the modules and scripts at server startup, and
  so you will need less memory than you have calculated.
  
  Do not forget that other essential system processes need memory as
  well, so you should plan not only for the Web server, but also take
  into account the other players.  Remember that requests can be queued,
  so you can afford to let your client wait for a few moments until a
  server is available to serve it.  Most of the time your server will
  not have the maximum load, but you should be ready to bear the peaks.
  You need to reserve at least 20% of free memory for peak situations.
  Many sites have crashed a few moments after a big scoop about them was
  posted and an unexpected number of requests suddenly came in.  (This
  is called the Slashdot effect, which was born at http://slashdot.org ).
  If you are about to announce something cool, be aware of the possible
  consequences.
  
  =head2 CPU
  
  Make sure that the CPU is operating within its specifications.  Many
  boxes are shipped with incorrect settings for CPU clock speed, power
  supply voltage etc.  Sometimes a cooling fan is not fitted.  It may be
  ineffective because a cable assembly fouls the fan blades.  Like
  faulty RAM, an overheating processor can cause all kinds of strange
  and unpredictable things to happen.  Some CPUs are known to have bugs
  which can be serious in certain circumstances.  Try not to get one of
  them.
  
  =head2 Bottlenecks
  
  You might use the most expensive components, but still get bad
  performance.  Why?  Let me introduce an annoying word: bottleneck.
  
  A machine is an aggregate of many components.  Almost any one of them
  may become a bottleneck.
  
  If you have a fast processor but a small amount of RAM, the RAM will
  probably be the bottleneck.  The processor will be under-utilized,
  usually it will be waiting for the kernel to swap the memory pages in
  and out, because memory is too small to hold the busiest pages.
  
  If you have a lot of memory, a fast processor, a fast disk, but a slow
  disk controller, the disk controller will be the bottleneck.  The
  performance will still be bad, and you will have wasted money.
  
  Use a fast NIC that does not create a bottleneck.  They are cheap.  If
  the NIC is slow, the whole service is slow.  This is a most important
  component, since webservers are much more often network-bound than
  they are disk-bound!
  
  =head3 Solving Hardware Requirement Conflicts
  
  It may happen that the combination of software components which you
  find yourself using gives rise to conflicting requirements for the
  optimization of tuning parameters.  If you can separate the components
  onto different machines you may find that this approach (a kind of
  clustering) solves the problem, at much less cost than buying faster
  hardware, because you can tune the machines individually to suit the
  tasks they should perform.
  
  For example if you need to run a relational database engine and
  mod_perl server, it can be wise to put the two on different machines,
  since while RDBMS need a very fast disk, mod_perl processes need lots
  of memory. So by placing the two on different machines it's easy to
  optimize each machine at separate and satisfy the each software
  components requirements in the best way.
  
  =head2 Conclusion
  
  To use your money optimally you have to understand the hardware very
  well, so you will know what to pick.  Otherwise, you should hire a
  knowledgeable hardware consultant and employ them on a regular basis,
  since your needs will probably change as time goes by and your
  hardware will likewise be forced to adapt as well.
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  
  =head1 Authors
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  Only the major authors are listed above. For contributors see the
  Changes file.
  
  
  =cut
  
  
  
  
  1.1                  modperl-docs/src/docs/general/multiuser.pod
  
  Index: multiuser.pod
  ===================================================================
  =head1 NAME
  
  mod_perl for ISPs. mod_perl and Virtual Hosts
  
  =head1 Description
  
  mod_perl hosting by ISPs: fantasy or reality? This section covers some
  topics that might be of interest to users looking for ISPs to host
  their mod_perl-based website, and ISPs looking for a way to provide
  such services.
  
  Today, it is a reality: there are a number of ISPs hosting mod_perl,
  although the number of these is not as big as we would have liked it
  to be. To see a list of ISPs that can provide mod_perl hosting, see
  L<ISPs supporting mod_perl|help::isps>.
  
  
  =head1 ISPs providing mod_perl services - a fantasy or a reality
  
  =over 4
  
  =item *
  
  You installed mod_perl on your box at home, and you fell in love with
  it.  So now you want to convert your CGI scripts (which currently are
  running on your favorite ISPs machine) to run under mod_perl.  Then
  you discover that your ISP has never heard of mod_perl, or he refuses
  to install it for you.
  
  =item *
  
  You are an old sailor in the ISP business, you have seen it all, you
  know how many ISPs are out there and you know that the sales margins
  are too low to keep you happy.  You are looking for some new service
  almost no one else provides, to attract more clients to become your
  users and hopefully to have a bigger slice of the action than your
  competitors.
  
  =back
  
  If you are a user asking for a mod_perl service or an ISP considering
  to provide this service, this section should make things clear for
  both of you.
  
  An ISP has three choices:
  
  =over 4
  
  =item 1
  
  ISPs probably cannot let users run scripts under mod_perl on the main
  server.  There are many reasons for this:
  
  Scripts might leak memory, due to sloppy programming.  There will not
  be enough memory to run as many servers as required, and clients will
  be not satisfied with the service because it will be slower.
  
  The question of file permissions is a very important issue: any user
  who is allowed to write and run a CGI script can at least read (if not
  write) any other files that belong to the same user and/or group the
  web server is running as.  Note that L<it's impossible to run
  C<suEXEC> and C<cgiwrap> extensions under
  mod_perl 
1.x|guide::install/Is_it_possible_to_run_mod_perl_enabled_Apache_as_suExec_>.
  
  Another issue is the security of the database connections.  If you use
  C<Apache::DBI>, by hacking the C<Apache::DBI> code you can pick a
  connection from the pool of cached connections even if it was opened
  by someone else and your scripts are running on the same web server.
  
  Yet another security issue is a potential compromise of the systems
  via user's code running on the webservers. One of the possible
  solutions here is to use chroot(1) or jail(8) mechanisms which allow
  to run subsystems isolated from the main system. So if a subsystem
  gets compromised the whole system is still safe.
  
  There are many more things to be aware of so at this time you have to
  say I<No>.
  
  Of course as an ISP you can run mod_perl internally, without allowing
  your users to map their scripts so that they will run under mod_perl.
  If as a part of your service you provide scripts such as guest books,
  counters etc. which are not available for user modification, you can
  still can have these scripts running very fast.
  
  =item 2
  
  But, hey why can't I let my users run their own servers, so I can wash
  my hands of them and don't have to worry about how dirty and sloppy
  their code is (assuming that the users are running their servers under
  their own usernames, to prevent them from stealing code and data from
  each other).
  
  This option is fine as long as you are not concerned about your new
  systems resource requirements.  If you have even very limited
  experience with mod_perl, you know that mod_perl enabled Apache
  servers while freeing up your CPU and allowing you to run scripts very
  much faster, have huge memory demands (5-20 times that of plain
  Apache).
  
  The size depends on the code length, the sloppiness of the
  programming, possible memory leaks the code might have and all that
  multiplied by the number of children each server spawns.  A very
  simple example: a server, serving an average number of scripts,
  demanding 10Mb of memory which spawns 10 children, already raises your
  memory requirements by 100Mb (the real requirement is actually much
  smaller if your OS allows code sharing between processes and
  programmers exploit these features in their code).  Now multiply the
  average required size by the number of server users you intend to have
  and you will get the total memory requirement.
  
  Since ISPs never say I<No>, you'd better take the inverse approach -
  think of the largest memory size you can afford then divide it by one
  user's requirements as I have shown in this example, and you will know
  how many mod_perl users you can afford :)
  
  But you cannot tell how much memory your users may use?  Their
  requirements from a single server can be very modest, but do you know
  how many servers they will run?  After all, they have full control of
  I<httpd.conf> - and it has to be this way, since this is essential for
  the user running mod_perl.
  
  All this rumbling about memory leads to a single question: is it
  possible to prevent users from using more than X memory?  Or another
  variation of the question: assuming you have as much memory as you
  want, can you charge users for their average memory usage?
  
  If the answer to either of the above questions is I<Yes>, you are all
  set and your clients will prize your name for letting them run
  mod_perl!  There are tools to restrict resource usage (see for example
  the man pages for C<ulimit(3)>, C<getrlimit(2)>, C<setrlimit(2)> and
  C<sysconf(3)>, the last three have the corresponding Perl modules:
  C<BSD::Resource> and C<Apache::Resource>).
  
  [ReaderMETA]: If you have experience with other resource limiting
  techniques please share it with us.  Thank you!
  
  If you have chosen this option, you have to provide your client with:
  
  =over 4
  
  =item *
  
  Shutdown and startup scripts installed together with the rest of your
  daemon startup scripts (e.g I</etc/rc.d> directory), so that when you
  reboot your machine the user's server will be correctly shutdown and
  will be back online the moment your system starts up.  Also make sure
  to start each server under the username the server belongs to, or you
  are going to be in big trouble!
  
  =item *
  
  Proxy services (in forward or httpd accelerator mode) for the user's
  virtual host.  Since the user will have to run their server on an
  unprivileged port (E<gt>1024), you will have to forward all requests
  from C<user.given.virtual.hostname:80> (which is
  C<user.given.virtual.hostname> without the default port 80) to
  C<your.machine.ip:port_assigned_to_user> .  You will also have to tell
  the users to code their scripts so that any self referencing URLs are
  of the form C<user.given.virtual.hostname>.
  
  Letting the user run a mod_perl server immediately adds a requirement
  for the user to be able to restart and configure their own server.
  Only root can bind to port 80, this is why your users have to use port
  numbers greater than 1024.
  
  Another solution would be to use a setuid startup script, but think
  twice before you go with it, since if users can modify the scripts
  they will get a root access. For more information refer to the section
  "L<SUID Start-up Scripts|guide::control/SUID_Start_up_Scripts>".
  
  =item *
  
  Another problem you will have to solve is how to assign ports between
  users.  Since users can pick any port above 1024 to run their server,
  you will have to lay down some rules here so that multiple servers do
  not conflict.
  
  A simple example will demonstrate the importance of this problem: I am
  a malicious user or I am just a rival of some fellow who runs his
  server on your ISP.  All I need to do is to find out what port my
  rival's server is listening to (e.g. using C<netstat(8)>) and
  configure my own server to listen on the same port.  Although I am
  unable to bind to this port, imagine what will happen when you reboot
  your system and my startup script happens to be run before my rivals!
  I get the port first, now all requests will be redirected to my
  server.  I'll leave to your imagination what nasty things might happen
  then.
  
  Of course the ugly things will quickly be revealed, but not before the
  damage has been done.
  
  =back
  
  Basically you can preassign each user a port, without them having to
  worry about finding a free one, as well as enforce C<MaxClients> and
  similar values by implementing the following scenario:
  
  For each user have two configuration files, the main file,
  I<httpd.conf> (non-writable by user) and the user's file,
  I<username.httpd.conf> where they can specify their own configuration
  parameters and override the ones defined in I<httpd.conf>.  Here is
  what the main configuration file looks like:
  
    httpd.conf
    ----------
    # Global/default settings, the user may override some of these
    ...
    ...
    # Included so that user can set his own configuration
    Include username.httpd.conf
  
    # User-specific settings which will override any potentially 
    # dangerous configuration directives in username.httpd.conf
    ...
    ...
  
    username.httpd.conf
    -------------------
    # Settings that your user would like to add/override,
    # like <Location> and PerlModule directives, etc.
  
  Apache reads the global/default settings first.  Then it reads the
  I<Include>'d I<username.httpd.conf> file with whatever settings the
  user has chosen, and finally it reads the user-specific settings that
  we don't want the user to override, such as the port number.  Even if
  the user changes the port number in his I<username.httpd.conf> file,
  Apache reads our settings last, so they take precedence.  Note that
  you can use L<Perl sections|guide::config/Apache_Configuration_in_Perl> to
  make the configuration much easier.
  
  =item 3
  
  A much better, but costly solution is I<co-location>.  Let the user
  hook his (or your) stand-alone machine into your network, and forget
  about this user.  Of course either the user or you will have to
  undertake all the system administration chores and it will cost your
  client more money.
  
  Who are the people who seek mod_perl support?  They are people who run
  serious projects/businesses.  Money is not usually an obstacle.  They
  can afford a stand alone box, thus achieving their goal of autonomy
  whilst keeping their ISP happy.
  
  =back
  
  =head2 Virtual Servers Technologies
  
  As we have just seen one of the obstacles of using mod_perl in ISP
  environments, is the problem of isolating customers using the same
  machine from each other. A number of virtual servers (don't confuse
  with virtual hosts) technologies (both commercial and Open Source)
  exist today. Here are some of them:
  
  =over
  
  =item * The User-mode Linux Kernel
  
  http://user-mode-linux.sourceforge.net/
  
  User-Mode Linux is a safe, secure way of running Linux versions and
  Linux processes. Run buggy software, experiment with new Linux kernels
  or distributions, and poke around in the internals of Linux, all
  without risking your main Linux setup.
  
  User-Mode Linux gives you a virtual machine that may have more
  hardware and software virtual resources than your actual, physical
  computer. Disk storage for the virtual machine is entirely contained
  inside a single file on your physical machine. You can assign your
  virtual machine only the hardware access you want it to have. With
  properly limited access, nothing you do on the virtual machine can
  change or damage your real computer, or its software.
  
  So if you want to completely protect one user from another and
  yourself from your users this might be yet another alternative to the
  solutions suggested at the beginning of this chapter.
  
  =item * VMWare Technology
  
  Allows running a few instances of the same or different OSs on the
  same machine. This technology comes in two flavors:
  
  Open source: http://www.plex86.org/
  
  Commercial: http://www.vmware.com/
  
  So you may want to run a separate OS for each of your clients
  
  =item * freeVSD Technology
  
  freeVSD (http://www.freevsd.org), an open source project sponsored by
  Idaya Ltd. The software enables ISPs to securely partition their
  physical servers into many I<virtual servers>, each capable of running
  popular hosting applications such as Apache, Sendmail and MySQL.
  
  =item * S/390 IBM server
  
  Quoting from: http://www.s390.ibm.com/linux/vif/
  
  "The S/390 Virtual Image Facility enables you to run tens to hundreds
  of Linux server images on a single S/390 server. It is ideally suited
  for those who want to move Linux and/or UNIX workloads deployed on
  multiple servers onto a single S/390 server, while maintaining the
  same number of distinct server images. This provides centralized
  management and operation of the multiple image environment, reducing
  complexity, easing administration and lowering costs."
  
  In two words, this a great solution to huge ISPs, as it allows you to
  run hundreds of mod_perl servers while having only one box to
  maintain. The drawback is the price :)
  
  Check out this scalable mailing list thread for more details from
  those who know:
  http://archive.develooper.com/[EMAIL PROTECTED]/msg00235.html
  
  =back
  
  =head1 Virtual Hosts in the guide
  
  If you are about to use I<Virtual Hosts> you might want to read these
  sections:
  
  L<Apache Configuration in Perl|guide::config/Apache_Configuration_in_Perl>
  
  L<Easing the Chores of Configuring Virtual Hosts with
  mod_macro|guide::config/Configuring_Apache___mod_perl_with_mod_macro>
  
  L<Is There a Way to Provide a Different startup.pl File for Each
  Individual Virtual 
Host|guide::config/Is_There_a_Way_to_Provide_a_Different_startup_pl_File_for_Each_Individual_Virtual_Host>
  
  L<Is There a Way to Modify @INC on a Per-Virtual-Host or Per-Location
  
Basis.|guide::config/Is_There_a_Way_to_Modify__INC_on_a_Per_Virtual_Host_or_Per_Location_Basis_>
 
  
  L<A Script From One Virtual Host Calls a Script with the Same Path
  From the Other Virtual 
Host|guide::config/A_Script_From_One_Virtual_Host_Calls_a_Script_with_the_Same_Path_From_the_Other_Virtual_Host>
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  
  =head1 Authors
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  Only the major authors are listed above. For contributors see the
  Changes file.
  
  
  =cut
  
  
  
  1.1                  modperl-docs/src/docs/general/perl_myth.pod
  
  Index: perl_myth.pod
  ===================================================================
  =head1 NAME
  
  Popular Perl Complaints and Myths
  
  =head1 Description
  
  This document tries to explain the myths about Perl and overturn the
  FUD certain bodies try to spread.
  
  =head1 Abbreviations
  
  =over 4
  
  =item *
  
  B<M> = Misconception or Myth
  
  =item *
  
  B<R> = Response
  
  =back
  
  =head2 Interpreted vs. Compiled
  
  =over 4
  
  =item M:
  
  Each dynamic perl page hit needs to load the Perl interpreter and
  compile the script, then run it each time a dynamic web page is hit.
  This dramatically decreases performance as well as makes Perl an
  unscalable model since so much overhead is required to search each
  page.
  
  =item R:
  
  This myth was true years ago before the advent of mod_perl.  mod_perl
  loads the interpreter once into memory and never needs to load it
  again. Each perl program is only compiled once. The compiled version
  is then kept into memory and used each time the program is run.  In
  this way there is no extra overhead when hitting a mod_perl page.
  
  =back
  
  =head3 Interpreted vs. Compiled (More Gory Details)
  
  =over 4
  
  =item R:
  
  Compiled code always has the potential to be faster than interpreted
  code. Ultimately, all interpreted code needs to eventually be converted
  to native instructions at some point, and this is invariably has to be
  done by a compiled application.
  
  That said, an interpreted language CAN be faster than a comprable
  native application in certain situations, given certain, common
  programming practices. For example, the allocation and de-allocation
  of memory can be a relatively expensive process in a tightly scoped
  compiled language, wheras interpreted languages typically use garbage
  collectors which don't need to do expensive deallocation in a tight
  loop, instead waiting until additional memory is absolutely necessary,
  or for a less computationally intensive period. Of course, using a
  garbage collector in C would eliminate this edge in this situation,
  but where using garbage collectors in C is uncommon, Perl and most
  other interpreted languages have built-in garbage collectors.
  
  It is also important to point out that few people use the full
  potential of their modern CPU with a single application. Modern CPUs
  are not only more than fast enough to run interpreted code, many
  processors include instruction sets designed to increase the
  performance of interpreted code.
  
  =back
  
  =head2 Perl is overly memory intensive making it unscalable
  
  =over 4
  
  =item M:
  
  Each child process needs the Perl interpreter and all code in memory.
  Even with mod_perl httpd processes tend to be overly large, slowing
  performance, and requiring much more hardware.
  
  =item R: 
  
  In mod_perl the interpreter is loaded into the parent process and
  shared between the children.  Also, when scripts are loaded into the
  parent and the parent forks a child httpd process, that child shares
  those scripts with the parent.  So while the child may take 6MB of
  memory, 5MB of that might be shared meaning it only really uses 1MB
  per child.  Even 5 MB of memory per child is not uncommon for most web
  applications on other languages.
  
  Also, most modern operating systems support the concept of shared
  libraries. Perl can be compiled as a shared library, enabling the bulk
  of the perl interpreter to be shared between processes. Some
  executable formats on some platforms (I believe ELF is one such
  format) are able to share entire executable TEXT segments between
  unrelated processes.
  
  =back
  
  =head3 More Tuning Advice:
  
  =over 4
  
  =item *
  
  L<Vivek Khera's mod_perl performance tuning guide|faqs::mod_perl_tuning>
  
  =item *
  
  L<Stas Bekman's Performance Guide|guide::performance>
  
  =back
  
  =head2 Not enough support, or tools to develop with Perl. (Myth)
  
  =over 4
  
  =item R:
  
  Of all web applications and languages, Perl arguable has the most
  support and tools. B<CPAN> is a central repository of Perl modules
  which are freely downloadable and usually well supported.  There are
  literally thousands of modules which make building web apps in Perl
  much easier.  There are also countless mailing lists of extremely
  responsive Perl experts who usually respond to questions within an
  hour.  There are also a number of Perl development environments to
  make building Perl Web applications easier.  Just to name a few, there
  is C<Apache::ASP>, C<Mason>, C<embPerl>, C<ePerl>, etc...
  
  =back
  
  =head2 If Perl scales so well, how come no large sites use it? (myth)
  
  =over 4
  
  =item R:
  
  Actually, many large sites DO use Perl for the bulk of their web
  applications.  Here are some, just as an example: B<e-Toys>,
  B<CitySearch>, B<Internet Movie Database>( http://imdb.com ), B<Value
  Click> ( http://valueclick.com ), B<Paramount Digital Entertainment>,
  B<CMP> ( http://cmpnet.com ), B<HotBot Mail>/B<HotBot Homepages>, and
  B<DejaNews> to name a few.  Even B<Microsoft> has taken interest in
  Perl via http://www.activestate.com/.
  
  =back
  
  =head2 Perl even with mod_perl, is always slower then C.
  
  =over 4
  
  =item R:
  
  The Perl engine is written in C. There is no point arguing that Perl
  is faster than C because anything written in Perl could obviously be
  re-written in C. The same holds true for arguing that C is faster than
  assembly.
      
  There are two issues to consider here.  First of all, many times a web
  application written in Perl B<CAN be faster> than C thanks to the low
  level optimizations in the Perl compiler.  In other words, its easier
  to write poorly written C then well written Perl.  Secondly its
  important to weigh all factors when choosing a language to build a web
  application in.  Time to market is often one of the highest priorities
  in creating a web application. Development in Perl can often be twice
  as fast as in C.  This is mostly due to the differences in the
  language themselves as well as the wealth of free examples and modules
  which speed development significantly.  Perl's speedy development time
  can be a huge competitive advantage.
  
  =back
  
  =head2 Java does away with the need for Perl.
  
  =over 4
  
  =item M:
  
  Perl had its place in the past, but now there's Java and Java will
  kill Perl.
  
  =item R:
  
  Java and Perl are actually more complimentary languages then
  competitive.  Its widely accepted that server side Java solutions such
  as C<JServ>, C<JSP> and C<JRUN>, are far slower then mod_perl
  solutions (see next myth).  Even so, Java is often used as the front
  end for server side Perl applications.  Unlike Perl, with Java you can
  create advanced client side applications.  Combined with the strength
  of server side Perl these client side Java applications can be made
  very powerful.
  
  =back
  
  =head2 Perl can't create advanced client side applications
  
  =over 4
  
  =item R:
  
  True.  There are some client side Perl solutions like PerlScript in
  MSIE 5.0, but all client side Perl requires the user to have the Perl
  interpreter on their local machine.  Most users do not have a Perl
  interpreter on their local machine.  Most Perl programmers who need to
  create an advanced client side application use Java as their client
  side programming language and Perl as the server side solution.
  
  =back
  
  =head2 ASP makes Perl obsolete as a web programming language.
  
  =over 4
  
  =item M: 
  
  With Perl you have to write individual programs for each set of pages.
  With ASP you can write simple code directly within HTML pages.  ASP is
  the Perl killer.
  
  =item R:
  
  There are many solutions which allow you to embed Perl in web pages
  just like ASP.  In fact, you can actually use Perl IN ASP pages with
  PerlScript.  Other solutions include: C<Mason>, C<Apache::ASP>,
  C<ePerl>, C<embPerl> and C<XPP>. Also, Microsoft and ActiveState have
  worked very hard to make Perl run equally well on NT as Unix.  You can
  even create COM modules in Perl that can be used from within ASP
  pages.  Some other advantages Perl has over ASP: mod_perl is usually
  much faster then ASP, Perl has much more example code and full
  programs which are freely downloadable, and Perl is cross platform,
  able to run on Solaris, Linux, SCO, Digital Unix, Unix V, AIX, OS2,
  VMS MacOS, Win95-98 and NT to name a few.
  
  Also, Benchmarks show that embedded Perl solutions outperform ASP/VB
  on IIS by several orders of magnitude. Perl is a much easier language
  for some to learn, especially those with a background in C or C++.
  
  =back
  
  =head1 Credits
  
  Thanks to the mod_perl list for all of the good information and
  criticism.  I'd especially like to thank,
  
  =over 4
  
  =item *
  
  Stas Bekman E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  Thornton Prime E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  Chip Turner E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  Clinton E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  Joshua Chamas E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  John Edstrom E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  Rasmus Lerdorf E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  Nedim Cholich E<lt>[EMAIL PROTECTED]<gt>
  
  =item *
  
  Mike Perry E<lt> http://www.icorp.net/icorp/feedback.htm E<gt>
  
  =item *
  
  Finally, I'd like to thank Robert Santos E<lt>[EMAIL PROTECTED]<gt>,
  CyberNation's lead Business Development guy for inspiring this
  document.
  
  =back
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  =over
  
  =item * 
  
  Contact the L<mod_perl docs list|maillist::list-docs-dev>
  
  =back
  
  
  =head1 Authors
  
  =over
  
  =item *
  
  Adam Pisoni E<lt>[EMAIL PROTECTED]<gt>
  
  =back
  
  Only the major authors are listed above. For contributors see the
  Changes file.
  
  
  =cut
  
  
  
  1.1                  modperl-docs/src/docs/general/perl_reference.pod
  
  Index: perl_reference.pod
  ===================================================================
  =head1 NAME
  
  Perl Reference
  
  =head1 Description
  
  This document was born because some users are reluctant to learn Perl,
  prior to jumping into mod_perl.  I will try to cover some of the most
  frequent pure Perl questions being asked at the list.
  
  Before you decide to skip this chapter make sure you know all the
  information provided here.  The rest of the Guide assumes that you
  have read this chapter and understood it.
  
  =head1 perldoc's Rarely Known But Very Useful Options
  
  First of all, I want to stress that you cannot become a Perl hacker
  without knowing how to read Perl documentation and search through it.
  Books are good, but an easily accessible and searchable Perl reference
  at your fingertips is a great time saver. It always has the up-to-date
  information for the version of perl you're using.
  
  Of course you can use online Perl documentation at the Web. The two
  major sites are http://www.perldoc.com and
  http://theoryx5.uwinnipeg.ca/CPAN/perl/.
  
  The C<perldoc> utility provides you with access to the documentation
  installed on your system.  To find out what Perl manpages are
  available execute:
  
    % perldoc perl
  
  To find what functions perl has, execute:
  
    % perldoc perlfunc
  
  To learn the syntax and to find examples of a specific function, you
  would execute (e.g. for C<open()>):
  
    % perldoc -f open
  
  Note: In perl5.005_03 and earlier, there is a bug in this and the C<-q> 
  options of C<perldoc>.  It won't call C<pod2man>, but will display the 
  section in POD format instead.  Despite this bug it's still readable 
  and very useful. 
  
  The Perl FAQ (I<perlfaq> manpage) is in several sections.  To search
  through the sections for C<open> you would execute:
  
    % perldoc -q open
  
  This will show you all the matching Question and Answer sections,
  still in POD format.
  
  To read the I<perldoc> manpage you would execute:
  
    % perldoc perldoc
  
  =head1 Tracing Warnings Reports
  
  Sometimes it's very hard to understand what a warning is complaining
  about.  You see the source code, but you cannot understand why some
  specific snippet produces that warning.  The mystery often results
  from the fact that the code can be called from different places if
  it's located inside a subroutine.
  
  Here is an example:
  
    warnings.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    
    correct();
    incorrect();
    
    sub correct{
      print_value("Perl");
    }
    
    sub incorrect{
      print_value();
    }
    
    sub print_value{
      my $var = shift;
      print "My value is $var\n";
    }
  
  In the code above, print_value() prints the passed value.  Subroutine
  correct() passes the value to print, but in subroutine incorrect() we
  forgot to pass it. When we run the script:
  
    % ./warnings.pl
  
  we get the warning:
  
    Use of uninitialized value at ./warnings.pl line 16.
  
  Perl complains about an undefined variable C<$var> at the line that
  attempts to print its value:
  
    print "My value is $var\n";
  
  But how do we know why it is undefined? The reason here obviously is
  that the calling function didn't pass the argument. But how do we know
  who was the caller? In our example there are two possible callers, in
  the general case there can be many of them, perhaps located in other
  files.
  
  We can use the caller() function, which tells who has called us, but
  even that might not be enough: it's possible to have a longer sequence
  of called subroutines, and not just two. For example, here it is sub
  third() which is at fault, and putting sub caller() in sub second()
  would not help us very much:
  
    sub third{
      second();
    }
    sub second{
      my $var = shift;
      first($var);
    }
    sub first{
      my $var = shift;
     print "Var = $var\n"
    }
  
  The solution is quite simple. What we need is a full calls stack trace
  to the call that triggered the warning.
  
  The C<Carp> module comes to our aid with its cluck() function. Let's
  modify the script by adding a couple of lines.  The rest of the script
  is unchanged.
  
    warnings2.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    use Carp ();
    local $SIG{__WARN__} = \&Carp::cluck;
    
    correct();
    incorrect();
    
    sub correct{
      print_value("Perl");
    }
    
    sub incorrect{
      print_value();
    }
    
    sub print_value{
      my $var = shift;
      print "My value is $var\n";
    }
  
  Now when we execute it, we see:
  
    Use of uninitialized value at ./warnings2.pl line 19.
      main::print_value() called at ./warnings2.pl line 14
      main::incorrect() called at ./warnings2.pl line 7
  
  Take a moment to understand the calls stack trace. The deepest calls
  are printed first. So the second line tells us that the warning was
  triggered in print_value(); the third, that print_value() was
  called by subroutine, incorrect().
  
    script => incorrect() => print_value()
  
  We go into C<incorrect()> and indeed see that we forgot to pass the
  variable. Of course when you write a subroutine like C<print_value> it
  would be a good idea to check the passed arguments before starting
  execution. We omitted that step to contrive an easily debugged example.
  
  Sure, you say, I could find that problem by simple inspection of the
  code! 
  
  Well, you're right. But I promise you that your task would be quite
  complicated and time consuming if your code has some thousands of
  lines.  In addition, under mod_perl, certain uses of the C<eval>
  operator and "here documents" are known to throw off Perl's line
  numbering, so the messages reporting warnings and errors can have
  incorrect line numbers. (See L<Finding the Line Which Triggered the
  Error or Warning|guide::debug/Finding_the_Line_Which_Triggered> for more
  information).
  
  Getting the trace helps a lot.
  
  =head1 Variables Globally, Lexically Scoped And Fully Qualified
  
  META: this material is new and requires polishing so read with care.
  
  You will hear a lot about namespaces, symbol tables and lexical
  scoping in Perl discussions, but little of it will make any sense
  without a few key facts:
  
  =head2 Symbols, Symbol Tables and Packages; Typeglobs
  
  There are two important types of symbol: package global and lexical.
  We will talk about lexical symbols later, for now we will talk only
  about package global symbols, which we will refer to simply as
  I<global symbols>.
  
  The names of pieces of your code (subroutine names) and the names of
  your global variables are symbols.  Global symbols reside in one
  symbol table or another.  The code itself and the data do not; the
  symbols are the names of pointers which point (indirectly) to the
  memory areas which contain the code and data. (Note for C/C++
  programmers: we use the term `pointer' in a general sense of one piece
  of data referring to another piece of data not in a specific sense as
  used in C or C++.)
  
  There is one symbol table for each package, (which is why I<global
  symbols> are really I<package global symbols>).
  
  You are always working in one package or another.
  
  Like in C, where the first function you write must be called main(),
  the first statement of your first Perl script is in package C<main::>
  which is the default package.  Unless you say otherwise by using the
  C<package> statement, your symbols are all in package C<main::>. You
  should be aware straight away that files and packages are I<not
  related>. You can have any number of packages in a single file; and a
  single package can be in one file or spread over many files. However
  it is very common to have a single package in a single file. To
  declare a package you write:
  
      package mypackagename;
  
  From the following line you are in package C<mypackagename> and any
  symbols you declare reside in that package. When you create a symbol
  (variable, subroutine etc.) Perl uses the name of the package in which
  you are currently working as a prefix to create the fully qualified
  name of the symbol.
  
  When you create a symbol, Perl creates a symbol table entry for that
  symbol in the current package's symbol table (by default
  C<main::>). Each symbol table entry is called a I<typeglob>. Each
  typeglob can hold information on a scalar, an array, a hash, a
  subroutine (code), a filehandle, a directory handle and a format, each
  of which all have the same name.  So you see now that there are two
  indirections for a global variable: the symbol, (the thing's name),
  points to its typeglob and the typeglob for the thing's type (scalar,
  array, etc.)  points to the data. If we had a scalar and an array with
  the same name their name would point to the same typeglob, but for
  each type of data the typeglob points to somewhere different and so
  the scalar's data and the array's data are completely separate and
  independent, they just happen to have the same name.
  
  Most of the time, only one part of a typeglob is used (yes, it's a bit
  wasteful).  You will by now know that you distinguish between them by
  using what the authors of the Camel book call a I<funny character>. So
  if we have a scalar called `C<line>' we would refer to it in code as
  C<$line>, and if we had an array of the same name, that would be
  written, C<@line>. Both would point to the same typeglob (which would
  be called C<*line>), but because of the I<funny character> (also known
  as I<decoration>) perl won't confuse the two. Of course we might
  confuse ourselves, so some programmers don't ever use the same name
  for more than one type of variable.
  
  Every global symbol is in some package's symbol table. To refer to a
  global symbol we could write the I<fully qualified> name,
  e.g. C<$main::line>. If we are in the same package as the symbol we
  can omit the package name, e.g.  C<$line> (unless you use the C<strict>
  pragma and then you will have to predeclare the variable using the
  C<vars> pragma). We can also omit the package name if we have imported
  the symbol into our current package's namespace. If we want to refer
  to a symbol that is in another package and which we haven't imported
  we must use the fully qualified name, e.g. C<$otherpkg::box>.
  
  Most of the time you do not need to use the fully qualified symbol
  name because most of the time you will refer to package variables from
  within the package.  This is very like C++ class variables.  You can
  work entirely within package C<main::> and never even know you are
  using a package, nor that the symbols have package names.  In a way,
  this is a pity because you may fail to learn about packages and they
  are extremely useful.
  
  The exception is when you I<import> the variable from another package.
  This creates an alias for the variable in the I<current> package, so  
  that you can access it without using the fully qualified name.
  
  Whilst global variables are useful for sharing data and are necessary in some
  contexts it is usually wisest to minimize their use and use I<lexical
  variables>, discussed next, instead.
  
  Note that when you create a variable, the low-level business of
  allocating memory to store the information is handled automatically by
  Perl.  The intepreter keeps track of the chunks of memory to which the
  pointers are pointing and takes care of undefining variables. When all
  references to a variable have ceased to exist then the perl garbage
  collector is free to take back the memory used ready for
  recycling. However perl almost never returns back memory it has
  already used to the operating system during the lifetime of the
  process.
  
  =head3 Lexical Variables and Symbols
  
  The symbols for lexical variables (i.e. those declared using the
  keyword C<my>) are the only symbols which do I<not> live in a symbol
  table.  Because of this, they are not available from outside the block
  in which they are declared.  There is no typeglob associated with a
  lexical variable and a lexical variable can refer only to a scalar, an
  array, a hash or a code reference. (Since perl-5.6 it can also refer
  to a file glob).
  
  If you need access to the data from outside the package then you can
  return it from a subroutine, or you can create a global variable
  (i.e. one which has a package prefix) which points or refers to it and
  return that.  The pointer or reference must be global so that you can
  refer to it by a fully qualified name. But just like in C try to avoid
  having global variables. Using OO methods generally solves this
  problem, by providing methods to get and set the desired value within
  the object that can be lexically scoped inside the package and passed
  by reference.
  
  The phrase "lexical variable" is a bit of a misnomer, we are really
  talking about "lexical symbols".  The data can be referenced by a
  global symbol too, and in such cases when the lexical symbol goes out
  of scope the data will still be accessible through the global symbol.
  This is perfectly legitimate and cannot be compared to the terrible
  mistake of taking a pointer to an automatic C variable and returning
  it from a function--when the pointer is dereferenced there will be a
  segmentation fault.  (Note for C/C++ programmers: having a function
  return a pointer to an auto variable is a disaster in C or C++; the
  perl equivalent, returning a reference to a lexical variable created
  in a function is normal and useful.)
  
  =over 
  
  =item *
  
  C<my()> vs. C<use vars>:
  
  With use vars(), you are making an entry in the symbol table, and you
  are telling the compiler that you are going to be referencing that
  entry without an explicit package name.
  
  With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE.  The compiler figures
  out C<at compile time> which my() variables (i.e. lexical variables)
  are the same as each other, and once you hit execute time you cannot
  go looking those variables up in the symbol table.
  
  =item *
  
  C<my()> vs. C<local()>:
  
  local() creates a temporal-limited package-based scalar, array, hash,
  or glob -- when the scope of definition is exited at runtime, the
  previous value (if any) is restored.  References to such a variable
  are *also* global... only the value changes.  (Aside: that is what
  causes variable suicide. :)
  
  my() creates a lexically-limited non-package-based scalar, array, or
  hash -- when the scope of definition is exited at compile-time, the
  variable ceases to be accessible.  Any references to such a variable
  at runtime turn into unique anonymous variables on each scope exit.
  
  =back
  
  =head2 Additional reading references
  
  For more information see: L<Using global variables and sharing them
  between modules/packages|guide::perl/Using_Global_Variables_and_Shari> and an
  article by Mark-Jason Dominus about how Perl handles variables and
  namespaces, and the difference between C<use vars()> and C<my()> -
  http://www.plover.com/~mjd/perl/FAQs/Namespaces.html .
  
  =head1 my() Scoped Variable in Nested Subroutines
  
  Before we proceed let's make the assumption that we want to develop
  the code under the C<strict> pragma. We will use lexically scoped
  variables (with help of the my() operator) whenever it's possible.
  
  =head2 The Poison
  
  Let's look at this code:
  
    nested.pl
    -----------
    #!/usr/bin/perl
    
    use strict;
    
    sub print_power_of_2 {
      my $x = shift;
    
      sub power_of_2 {
        return $x ** 2; 
      }
    
      my $result = power_of_2();
      print "$x^2 = $result\n";
    }
    
    print_power_of_2(5);
    print_power_of_2(6);
  
  Don't let the weird subroutine names fool you, the print_power_of_2()
  subroutine should print the square of the number passed to it. Let's
  run the code and see whether it works:
  
    % ./nested.pl
    
    5^2 = 25
    6^2 = 25
  
  Ouch, something is wrong. May be there is a bug in Perl and it doesn't
  work correctly with the number 6? Let's try again using 5 and 7:
  
    print_power_of_2(5);
    print_power_of_2(7);
  
  And run it:
  
    % ./nested.pl
    
    5^2 = 25
    7^2 = 25
  
  Wow, does it works only for 5? How about using 3 and 5:
  
    print_power_of_2(3);
    print_power_of_2(5);
  
  and the result is:
  
    % ./nested.pl
    
    3^2 = 9
    5^2 = 9
  
  Now we start to understand--only the first call to the
  print_power_of_2() function works correctly. Which makes us think that
  our code has some kind of memory for the results of the first
  execution, or it ignores the arguments in subsequent executions.
  
  =head2 The Diagnosis
  
  Let's follow the guidelines and use the C<-w> flag. Now execute the
  code:
  
    % ./nested.pl
    
    Variable "$x" will not stay shared at ./nested.pl line 9.
    5^2 = 25
    6^2 = 25
  
  We have never seen such a warning message before and we don't quite
  understand what it means. The C<diagnostics> pragma will certainly
  help us. Let's prepend this pragma before the C<strict> pragma in our
  code:
  
    #!/usr/bin/perl -w
    
    use diagnostics;
    use strict;
  
  And execute it:
  
    % ./nested.pl
    
    Variable "$x" will not stay shared at ./nested.pl line 10 (#1)
      
      (W) An inner (nested) named subroutine is referencing a lexical
      variable defined in an outer subroutine.
      
      When the inner subroutine is called, it will probably see the value of
      the outer subroutine's variable as it was before and during the
      *first* call to the outer subroutine; in this case, after the first
      call to the outer subroutine is complete, the inner and outer
      subroutines will no longer share a common value for the variable.  In
      other words, the variable will no longer be shared.
      
      Furthermore, if the outer subroutine is anonymous and references a
      lexical variable outside itself, then the outer and inner subroutines
      will never share the given variable.
      
      This problem can usually be solved by making the inner subroutine
      anonymous, using the sub {} syntax.  When inner anonymous subs that
      reference variables in outer subroutines are called or referenced,
      they are automatically rebound to the current values of such
      variables.
      
    5^2 = 25
    6^2 = 25
  
  Well, now everything is clear. We have the B<inner> subroutine
  power_of_2() and the B<outer> subroutine print_power_of_2() in our
  code.
  
  When the inner power_of_2() subroutine is called for the first time,
  it sees the value of the outer print_power_of_2() subroutine's C<$x>
  variable. On subsequent calls the inner subroutine's C<$x> variable
  won't be updated, no matter what new values are given to C<$x> in the
  outer subroutine.  There are two copies of the C<$x> variable, no
  longer a single one shared by the two routines.
  
  =head2 The Remedy
  
  The C<diagnostics> pragma suggests that the problem can be solved by
  making the inner subroutine anonymous.
  
  An anonymous subroutine can act as a I<closure> with respect to
  lexically scoped variables. Basically this means that if you define a
  subroutine in a particular B<lexical> context at a particular moment,
  then it will run in that same context later, even if called from
  outside that context.  The upshot of this is that when the subroutine
  B<runs>, you get the same copies of the lexically scoped variables
  which were visible when the subroutine was B<defined>.  So you can
  pass arguments to a function when you define it, as well as when you
  invoke it.
  
  Let's rewrite the code to use this technique:
  
    anonymous.pl
    --------------
    #!/usr/bin/perl
    
    use strict;
    
    sub print_power_of_2 {
      my $x = shift;
    
      my $func_ref = sub {
        return $x ** 2;
      };
    
      my $result = &$func_ref();
      print "$x^2 = $result\n";
    }
    
    print_power_of_2(5);
    print_power_of_2(6);
  
  Now C<$func_ref> contains a reference to an anonymous subroutine,
  which we later use when we need to get the power of two.  Since it is
  anonymous, the subroutine will automatically be rebound to the new
  value of the outer scoped variable C<$x>, and the results will now be
  as expected.
  
  Let's verify:
  
    % ./anonymous.pl
    
    5^2 = 25
    6^2 = 36
  
  So we can see that the problem is solved. 
  
  =head1 Understanding Closures -- the Easy Way
  
  In Perl, a closure is just a subroutine that refers to one or more
  lexical variables declared outside the subroutine itself and must
  therefore create a distinct clone of the environment on the way out.
  
  And both named subroutines and anonymous subroutines can be closures.
  
  Here's how to tell if a subroutine is a closure or not:
  
    for (1..5) {
      push @a, sub { "hi there" };
    }
    for (1..5) {
      {
        my $b;
        push @b, sub { $b."hi there" };
      }
    }
    print "anon normal:\n", join "\t\n",@a,"\n";
    print "anon closure:\n",join "\t\n",@b,"\n";
  
  which generates:
  
    anon normal:
    CODE(0x80568e4)   
    CODE(0x80568e4)   
    CODE(0x80568e4)   
    CODE(0x80568e4)   
    CODE(0x80568e4)   
    
    anon closure:
    CODE(0x804b4c0)   
    CODE(0x8056b54)   
    CODE(0x8056bb4)   
    CODE(0x80594d8)   
    CODE(0x8059538)   
  
  Note how each code reference from the non-closure is identical, but
  the closure form must generate distinct coderefs to point at the
  distinct instances of the closure.
  
  And now the same with named subroutines:
  
    for (1..5) {
      sub a { "hi there" };
      push @a, \&a;
    }
    for (1..5) {
      {
        my $b;
        sub b { $b."hi there" };
        push @b, \&b;
      }
    }
    print "normal:\n", join "\t\n",@a,"\n";
    print "closure:\n",join "\t\n",@b,"\n";
  
  which generates:
  
    anon normal:
    CODE(0x80568c0)   
    CODE(0x80568c0)   
    CODE(0x80568c0)   
    CODE(0x80568c0)   
    CODE(0x80568c0)   
    
    anon closure:
    CODE(0x8056998)   
    CODE(0x8056998)   
    CODE(0x8056998)   
    CODE(0x8056998)   
    CODE(0x8056998)   
  
  We can see that both versions has generated the same code
  reference. For the subroutine I<a> it's easy, since it doesn't include
  any lexical variables defined outside it in the same lexical scope. 
  
  As for the subroutine I<b>, it's indeed a closure, but Perl won't
  recompile it since it's a named subroutine (see the I<perlsub>
  manpage). It's something that we don't want to happen in our code
  unless we want it for this special effect, similar to I<static>
  variables in C.
  
  This is the underpinnings of that famous I<"won't stay shared">
  message.  A I<my> variable in a named subroutine context is generating
  identical code references and therefore it ignores any future changes
  to the lexical variables outside of it.
  
  =head1 When You Cannot Get Rid of The Inner Subroutine
  
  First you might wonder, why in the world will someone need to define
  an inner subroutine? Well, for example to reduce some of Perl's script
  startup overhead you might decide to write a daemon that will compile
  the scripts and modules only once, and cache the pre-compiled code in
  memory. When some script is to be executed, you just tell the daemon
  the name of the script to run and it will do the rest and do it much
  faster since compilation has already taken place.
  
  Seems like an easy task, and it is. The only problem is once the
  script is compiled, how do you execute it? Or let's put it the other
  way: after it was executed for the first time and it stays compiled in
  the daemon's memory, how do you call it again? If you could get all
  developers to code their scripts so each has a subroutine called run()
  that will actually execute the code in the script then we've solved
  half the problem.
  
  But how does the daemon know to refer to some specific script if they
  all run in the C<main::> name space? One solution might be to ask the
  developers to declare a package in each and every script, and for the
  package name to be derived from the script name. However, since there
  is a chance that there will be more than one script with the same name
  but residing in different directories, then in order to prevent
  namespace collisions the directory has to be a part of the package
  name too. And don't forget that the script may be moved from one
  directory to another, so you will have to make sure that the package
  name is corrected every time the script gets moved.
  
  But why enforce these strange rules on developers, when we can arrange
  for our daemon to do this work? For every script that the daemon is
  about to execute for the first time, the script should be wrapped
  inside the package whose name is constructed from the mangled path to
  the script and a subroutine called run(). For example if the daemon is
  about to execute the script I</tmp/hello.pl>:
  
    hello.pl
    --------
    #!/usr/bin/perl
    print "Hello\n";
  
  Prior to running it, the daemon will change the code to be:
  
    wrapped_hello.pl
    ----------------
    package cache::tmp::hello_2epl;
    
    sub run{
      #!/usr/bin/perl 
      print "Hello\n";
    }
  
  The package name is constructed from the prefix C<cache::>, each
  directory separation slash is replaced with C<::>, and non
  alphanumeric characters are encoded so that for example C<.> (a dot)
  becomes C<_2e> (an underscore followed by the ASCII code for a dot in
  hex representation).
  
   % perl -e 'printf "%x",ord(".")'
  
  prints: C<2e>. The underscore is the same you see in URL encoding
  except the C<%> character is used instead (C<%2E>), but since C<%> has
  a special meaning in Perl (prefix of hash variable) it couldn't be
  used.
  
  Now when the daemon is requested to execute the script
  I</tmp/hello.pl>, all it has to do is to build the package name as
  before based on the location of the script and call its run()
  subroutine:
  
    use cache::tmp::hello_2epl;
    cache::tmp::hello_2epl::run();
  
  We have just written a partial prototype of the daemon we wanted. The
  only outstanding problem is how to pass the path to the script to the
  daemon. This detail is left as an exercise for the reader.
  
  If you are familiar with the C<Apache::Registry> module, you know that
  it works in almost the same way. It uses a different package prefix
  and the generic function is called handler() and not run(). The
  scripts to run are passed through the HTTP protocol's headers.
  
  Now you understand that there are cases where your normal subroutines
  can become inner, since if your script was a simple:
  
    simple.pl
    ---------
    #!/usr/bin/perl 
    sub hello { print "Hello" }
    hello();
  
  Wrapped into a run() subroutine it becomes:
  
    simple.pl
    ---------
    package cache::simple_2epl;
    
    sub run{
      #!/usr/bin/perl 
      sub hello { print "Hello" }
      hello();
    }
  
  Therefore, hello() is an inner subroutine and if you have used my()
  scoped variables defined and altered outside and used inside hello(),
  it won't work as you expect starting from the second call, as was
  explained in the previous section.
  
  =head2 Remedies for Inner Subroutines
  
  First of all there is nothing to worry about, as long as you don't
  forget to turn the warnings On.  If you do happen to have the 
  "L<my() Scoped Variable in Nested 
  Subroutines|guide::perl/my_Scoped_Variable_in_Nested_S>"
  problem, Perl will always alert you.
  
  Given that you have a script that has this problem, what are the ways
  to solve it? There are many of them and we will discuss some of them
  here.
  
  We will use the following code to show the different solutions.
  
    multirun.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    
    for (1..3){
      print "run: [time $_]\n";
      run();
    }
    
    sub run{
    
      my $counter = 0;
    
      increment_counter();
      increment_counter();
    
      sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\n";
      }
    
    } # end of sub run
  
  This code executes the run() subroutine three times, which in turn
  initializes the C<$counter> variable to 0, every time it is executed
  and then calls the inner subroutine increment_counter() twice. Sub
  increment_counter() prints C<$counter>'s value after incrementing
  it. One might expect to see the following output:
  
    run: [time 1]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 2]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 3]
    Counter is equal to 1 !
    Counter is equal to 2 !
  
  But as we have already learned from the previous sections, this is not
  what we are going to see. Indeed, when we run the script we see:
  
    % ./multirun.pl
  
    Variable "$counter" will not stay shared at ./nested.pl line 18.
    run: [time 1]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 2]
    Counter is equal to 3 !
    Counter is equal to 4 !
    run: [time 3]
    Counter is equal to 5 !
    Counter is equal to 6 !
  
  Obviously, the C<$counter> variable is not reinitialized on each
  execution of run(). It retains its value from the previous execution,
  and sub increment_counter() increments that.
  
  One of the workarounds is to use globally declared variables, with the
  C<vars> pragma.
  
    multirun1.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    use vars qw($counter);
    
    for (1..3){
      print "run: [time $_]\n";
      run();
    }
    
    sub run {
    
      $counter = 0;
    
      increment_counter();
      increment_counter();
    
      sub increment_counter{
        $counter++;
        print "Counter is equal to $counter !\n";
      }
    
    } # end of sub run
  
  If you run this and the other solutions offered below, the expected
  output will be generated:
  
    % ./multirun1.pl
    
    run: [time 1]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 2]
    Counter is equal to 1 !
    Counter is equal to 2 !
    run: [time 3]
    Counter is equal to 1 !
    Counter is equal to 2 !
  
  By the way, the warning we saw before has gone, and so has the
  problem, since there is no C<my()> (lexically defined) variable used
  in the nested subroutine.
  
  Another approach is to use fully qualified variables. This is better,
  since less memory will be used, but it adds a typing overhead:
  
    multirun2.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    
    for (1..3){
      print "run: [time $_]\n";
      run();
    }
    
    sub run {
    
      $main::counter = 0;
    
      increment_counter();
      increment_counter();
    
      sub increment_counter{
        $main::counter++;
        print "Counter is equal to $main::counter !\n";
      }
    
    } # end of sub run
  
  You can also pass the variable to the subroutine by value and make the
  subroutine return it after it was updated. This adds time and memory
  overheads, so it may not be good idea if the variable can be very
  large, or if speed of execution is an issue.
  
  Don't rely on the fact that the variable is small during the
  development of the application, it can grow quite big in situations
  you don't expect. For example, a very simple HTML form text entry
  field can return a few megabytes of data if one of your users is bored
  and wants to test how good your code is. It's not uncommon to see
  users copy-and-paste 10Mb core dump files into a form's text fields
  and then submit it for your script to process.
  
    multirun3.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    
    for (1..3){
      print "run: [time $_]\n";
      run();
    }
    
    sub run {
    
      my $counter = 0;
    
      $counter = increment_counter($counter);
      $counter = increment_counter($counter);
    
      sub increment_counter{
        my $counter = shift;
    
        $counter++;
        print "Counter is equal to $counter !\n";
    
        return $counter;
      }
    
    } # end of sub run
  
  Finally, you can use references to do the job. The version of
  increment_counter() below accepts a reference to the C<$counter>
  variable and increments its value after first dereferencing it. When
  you use a reference, the variable you use inside the function is
  physically the same bit of memory as the one outside the function.
  This technique is often used to enable a called function to modify
  variables in a calling function.
  
    multirun4.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    
    for (1..3){
      print "run: [time $_]\n";
      run();
    }
    
    sub run {
    
      my $counter = 0;
    
      increment_counter(\$counter);
      increment_counter(\$counter);
    
      sub increment_counter{
        my $r_counter = shift;
    
        $$r_counter++;
        print "Counter is equal to $$r_counter !\n";
      }
    
    } # end of sub run
  
  Here is yet another and more obscure reference usage. We modify the
  value of C<$counter> inside the subroutine by using the fact that
  variables in C<@_> are aliases for the actual scalar parameters. Thus
  if you called a function with two arguments, those would be stored in
  C<$_[0]> and C<$_[1]>. In particular, if an element C<$_[0]> is
  updated, the corresponding argument is updated (or an error occurs if
  it is not updatable as would be the case of calling the function with
  a literal, e.g. I<increment_counter(5)>).
  
    multirun5.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    
    for (1..3){
      print "run: [time $_]\n";
      run();
    }
    
    sub run {
    
      my $counter = 0;
    
      increment_counter($counter);
      increment_counter($counter);
    
      sub increment_counter{
        $_[0]++;
        print "Counter is equal to $_[0] !\n";
      }
    
    } # end of sub run
  
  The approach given above should be properly documented of course.
  
  Here is a solution that avoids the problem entirely by splitting the
  code into two files; the first is really just a wrapper and loader,
  the second file contains the heart of the code.
  
    multirun6.pl
    -----------
    #!/usr/bin/perl -w
    
    use strict;
    require 'multirun6-lib.pl' ;
    
    for (1..3){
      print "run: [time $_]\n";
      run();
    }
  
  Separate file:
  
    multirun6-lib.pl
    ----------------
    use strict ;
    
    my $counter;
  
    sub run {
      $counter = 0;
  
      increment_counter();
      increment_counter();
    }
    
    sub increment_counter{
      $counter++;
      print "Counter is equal to $counter !\n";
    }
    
    1 ;
  
  Now you have at least six workarounds to choose from.
  
  For more information please refer to perlref and perlsub manpages.
  
  =head1 use(), require(), do(), %INC and @INC Explained
  
  =head2 The @INC array
  
  C<@INC> is a special Perl variable which is the equivalent of the
  shell's C<PATH> variable. Whereas C<PATH> contains a list of
  directories to search for executables, C<@INC> contains a list of
  directories from which Perl modules and libraries can be loaded.
  
  When you use(), require() or do() a filename or a module, Perl gets a
  list of directories from the C<@INC> variable and searches them for
  the file it was requested to load.  If the file that you want to load
  is not located in one of the listed directories, you have to tell Perl
  where to find the file.  You can either provide a path relative to one
  of the directories in C<@INC>, or you can provide the full path to the
  file.
  
  =head2 The %INC hash
  
  C<%INC> is another special Perl variable that is used to cache the
  names of the files and the modules that were successfully loaded and
  compiled by use(), require() or do() statements. Before attempting to
  load a file or a module with use() or require(), Perl checks whether
  it's already in the C<%INC> hash. If it's there, the loading and
  therefore the compilation are not performed at all. Otherwise the file
  is loaded into memory and an attempt is made to compile it. do() does
  unconditional loading--no lookup in the C<%INC> hash is made.
  
  If the file is successfully loaded and compiled, a new key-value pair
  is added to C<%INC>. The key is the name of the file or module as it
  was passed to the one of the three functions we have just mentioned,
  and if it was found in any of the C<@INC> directories except C<".">
  the value is the full path to it in the file system.
  
  The following examples will make it easier to understand the logic.
  
  First, let's see what are the contents of C<@INC> on my system:
  
    % perl -e 'print join "\n", @INC'
    /usr/lib/perl5/5.00503/i386-linux
    /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005
    .
  
  Notice the C<.> (current directory) is the last directory in the list.
  
  Now let's load the module C<strict.pm> and see the contents of C<%INC>:
  
    % perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'
    
    strict.pm => /usr/lib/perl5/5.00503/strict.pm
  
  Since C<strict.pm> was found in I</usr/lib/perl5/5.00503/> directory
  and I</usr/lib/perl5/5.00503/> is a part of C<@INC>, C<%INC> includes
  the full path as the value for the key C<strict.pm>.
  
  Now let's create the simplest module in C</tmp/test.pm>:
  
    test.pm
    -------
    1;
  
  It does nothing, but returns a true value when loaded. Now let's load
  it in different ways:
  
    % cd /tmp
    % perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'
    
    test.pm => test.pm
  
  Since the file was found relative to C<.> (the current directory), the
  relative path is inserted as the value. If we alter C<@INC>, by adding
  I</tmp> to the end:
  
    % cd /tmp
    % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
    print map {"$_ => $INC{$_}\n"} keys %INC'
    
    test.pm => test.pm
  
  Here we still get the relative path, since the module was found first
  relative to C<".">. The directory I</tmp> was placed after C<.> in the
  list. If we execute the same code from a different directory, the
  C<"."> directory won't match,
  
    % cd /
    % perl -e 'BEGIN{push @INC, "/tmp"} use test; \
    print map {"$_ => $INC{$_}\n"} keys %INC'
    
    test.pm => /tmp/test.pm
  
  so we get the full path. We can also prepend the path with unshift(),
  so it will be used for matching before C<"."> and therefore we will
  get the full path as well:
  
    % cd /tmp
    % perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
    print map {"$_ => $INC{$_}\n"} keys %INC'
    
    test.pm => /tmp/test.pm
  
  The code:
  
    BEGIN{unshift @INC, "/tmp"}
  
  can be replaced with the more elegant:
  
    use lib "/tmp";
  
  Which is almost equivalent to our C<BEGIN> block and is the
  recommended approach.
  
  These approaches to modifying C<@INC> can be labor intensive, since
  if you want to move the script around in the file-system you have to
  modify the path. This can be painful, for example, when you move your
  scripts from development to a production server.
  
  There is a module called C<FindBin> which solves this problem in the
  plain Perl world, but unfortunately it won't work under mod_perl,
  since it's a module and as any module it's loaded only once. So the
  first script using it will have all the settings correct, but the rest
  of the scripts will not if located in a different directory from the
  first.
  
  For the sake of completeness, I'll present this module anyway.
  
  If you use this module, you don't need to write a hard coded path. The
  following snippet does all the work for you (the file is
  I</tmp/load.pl>):
  
    load.pl
    -------
    #!/usr/bin/perl
    
    use FindBin ();
    use lib "$FindBin::Bin";
    use test;
    print "test.pm => $INC{'test.pm'}\n";
  
  In the above example C<$FindBin::Bin> is equal to I</tmp>. If we move
  the script somewhere else... e.g. I</tmp/new_dir> in the code above
  C<$FindBin::Bin> equals I</tmp/new_dir>.
  
    % /tmp/load.pl
    
    test.pm => /tmp/test.pm
  
  This is just like C<use lib> except that no hard coded path is
  required.
  
  You can use this workaround to make it work under mod_perl.
  
    do 'FindBin.pm';
    unshift @INC, "$FindBin::Bin";
    require test;
    #maybe test::import( ... ) here if need to import stuff
  
  This has a slight overhead because it will load from disk and
  recompile the C<FindBin> module on each request. So it may not be
  worth it.
  
  =head2 Modules, Libraries and Program Files
  
  Before we proceed, let's define what we mean by I<module>, 
  I<library> and I<program file>.
  
  =over
  
  =item * Libraries
  
  These are files which contain Perl subroutines and other code.
  
  When these are used to break up a large program into manageable chunks
  they don't generally include a package declaration; when they are used
  as subroutine libraries they often do have a package declaration.
  
  Their last statement returns true, a simple C<1;> statement ensures
  that.
  
  They can be named in any way desired, but generally their extension is
  I<.pl>.
  
  Examples:
  
    config.pl
    ----------
    # No package so defaults to main::
    $dir = "/home/httpd/cgi-bin";
    $cgi = "/cgi-bin";
    1;
  
    mysubs.pl
    ----------
    # No package so defaults to main::
    sub print_header{
      print "Content-type: text/plain\r\n\r\n";
    }
    1;
  
    web.pl
    ------------
    package web ;
    # Call like this: web::print_with_class('loud',"Don't shout!");
    sub print_with_class{
      my( $class, $text ) = @_ ;
      print qq{<span class="$class">$text</span>};
    }
    1;
  
  =item * Modules
  
  A file which contains perl subroutines and other code.
  
  It generally declares a package name at the beginning of it.
  
  Modules are generally used either as function libraries (which I<.pl>
  files are still but less commonly used for), or as object libraries
  where a module is used to define a class and its methods.
  
  Its last statement returns true.
  
  The naming convention requires it to have a I<.pm> extension.
  
  Example:
  
    MyModule.pm
    -----------
    package My::Module;
    $My::Module::VERSION = 0.01;
    
    sub new{ return bless {}, shift;}
    END { print "Quitting\n"}
    1;
  
  =item * Program Files
  
  Many Perl programs exist as a single file. Under Linux and other
  Unix-like operating systems the file often has no suffix since the
  operating system can determine that it is a perl script from the first
  line (shebang line) or if it's Apache that executes the code, there is
  a variety of ways to tell how and when the file should be executed.
  Under Windows a suffix is normally used, for example C<.pl> or
  C<.plx>.
  
  The program file will normally C<require()> any libraries and C<use()>
  any modules it requires for execution.
  
  It will contain Perl code but won't usually have any package names.
  
  Its last statement may return anything or nothing.
  
  =back
  
  =head2 require()
  
  require() reads a file containing Perl code and compiles it. Before
  attempting to load the file it looks up the argument in C<%INC> to see
  whether it has already been loaded. If it has, require() just returns
  without doing a thing. Otherwise an attempt will be made to load and
  compile the file.
  
  require() has to find the file it has to load. If the argument is a
  full path to the file, it just tries to read it. For example:
  
    require "/home/httpd/perl/mylibs.pl";
  
  If the path is relative, require() will attempt to search for the file
  in all the directories listed in C<@INC>.  For example:
  
    require "mylibs.pl";
  
  If there is more than one occurrence of the file with the same name in
  the directories listed in C<@INC> the first occurrence will be used.
  
  The file must return I<TRUE> as the last statement to indicate
  successful execution of any initialization code. Since you never know
  what changes the file will go through in the future, you cannot be
  sure that the last statement will always return I<TRUE>. That's why
  the suggestion is to put "C<1;>" at the end of file.
  
  Although you should use the real filename for most files, if the file
  is a L<module|guide::perl/Modules__Libraries_and_Program_Files>, you may use 
the
  following convention instead:
  
    require My::Module;
  
  This is equal to:
  
    require "My/Module.pm";
  
  If require() fails to load the file, either because it couldn't find
  the file in question or the code failed to compile, or it didn't
  return I<TRUE>, then the program would die().  To prevent this the
  require() statement can be enclosed into an eval() exception-handling
  block, as in this example:
  
    require.pl
    ----------
    #!/usr/bin/perl -w
    
    eval { require "/file/that/does/not/exists"};
    if ($@) {
      print "Failed to load, because : $@"
    }
    print "\nHello\n";
  
  When we execute the program:
  
    % ./require.pl
    
    Failed to load, because : Can't locate /file/that/does/not/exists in
    @INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux
    /usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005 .) at require.pl line 3.
    
    Hello
  
  We see that the program didn't die(), because I<Hello> was
  printed. This I<trick> is useful when you want to check whether a user
  has some module installed, but if she hasn't it's not critical,
  perhaps the program can run without this module with reduced
  functionality.
  
  If we remove the eval() part and try again:
  
    require.pl
    ----------
    #!/usr/bin/perl -w
    
    require "/file/that/does/not/exists";
    print "\nHello\n";
  
    % ./require1.pl
    
    Can't locate /file/that/does/not/exists in @INC (@INC contains:
    /usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.
  
  The program just die()s in the last example, which is what you want in
  most cases.
  
  For more information refer to the perlfunc manpage.
  
  =head2 use()
  
  use(), just like require(), loads and compiles files containing Perl
  code, but it works with
  L<modules|guide::perl/Modules__Libraries_and_Program_Files> only and
  is executed at compile time.
  
  The only way to pass a module to load is by its module name and not
  its filename.  If the module is located in I<MyCode.pm>, the correct
  way to use() it is:
  
    use MyCode
  
  and not:
  
    use "MyCode.pm"
  
  use() translates the passed argument into a file name replacing C<::>
  with the operating system's path separator (normally C</>) and
  appending I<.pm> at the end. So C<My::Module> becomes I<My/Module.pm>.
  
  use() is exactly equivalent to:
  
   BEGIN { require Module; Module->import(LIST); }
  
  Internally it calls require() to do the loading and compilation
  chores. When require() finishes its job, import() is called unless
  C<()> is the second argument. The following pairs are equivalent:
  
    use MyModule;
    BEGIN {require MyModule; MyModule->import; }
    
    use MyModule qw(foo bar);
    BEGIN {require MyModule; MyModule->import("foo","bar"); }
    
    use MyModule ();
    BEGIN {require MyModule; }
  
  The first pair exports the default tags. This happens if the module
  sets C<@EXPORT> to a list of tags to be exported by default. The
  module's manpage normally describes what tags are exported by
  default.
  
  The second pair exports only the tags passed as arguments. 
  
  The third pair describes the case where the caller does not want any
  symbols to be imported.
  
  C<import()> is not a builtin function, it's just an ordinary static
  method call into the "C<MyModule>" package to tell the module to
  import the list of features back into the current package. See the
  Exporter manpage for more information.
  
  When you write your own modules, always remember that it's better to
  use C<@EXPORT_OK> instead of C<@EXPORT>, since the former doesn't
  export symbols unless it was asked to. Exports pollute the namespace
  of the module user. Also avoid short or common symbol names to reduce
  the risk of name clashes.
  
  When functions and variables aren't exported you can still access them
  using their full names, like C<$My::Module::bar> or
  C<$My::Module::foo()>.  By convention you can use a leading underscore
  on names to informally indicate that they are I<internal> and not for
  public use.
  
  There's a corresponding "C<no>" command that un-imports symbols
  imported by C<use>, i.e., it calls C<Module-E<gt>unimport(LIST)>
  instead of C<import()>.
  
  =head2 do()
  
  While do() behaves almost identically to require(), it reloads the
  file unconditionally. It doesn't check C<%INC> to see whether the file
  was already loaded.
  
  If do() cannot read the file, it returns C<undef> and sets C<$!> to
  report the error.  If do() can read the file but cannot compile it, it
  returns C<undef> and puts an error message in C<$@>. If the file is
  successfully compiled, do() returns the value of the last expression
  evaluated.
  
  =head1 Using Global Variables and Sharing Them Between Modules/Packages
  
  It helps when you code your application in a structured way, using the
  perl packages, but as you probably know once you start using packages
  it's much harder to share the variables between the various
  packagings. A configuration package comes to mind as a good example of
  the package that will want its variables to be accessible from the
  other modules.
  
  Of course using the Object Oriented (OO) programming is the best way
  to provide an access to variables through the access methods. But if
  you are not yet ready for OO techniques you can still benefit from
  using the techniques we are going to talk about.
  
  =head2 Making Variables Global
  
  When you first wrote C<$x> in your code you created a (package) global
  variable.  It is visible everywhere in your program, although if used
  in a package other than the package in which it was declared
  (C<main::> by default), it must be referred to with its fully
  qualified name, unless you have imported this variable with
  import(). This will work only if you do not use C<strict> pragma; but
  you I<have> to use this pragma if you want to run your scripts under
  mod_perl. Read L<The strict
  pragma|guide::porting/The_strict_pragma> to find out why. 
  
  =head2 Making Variables Global With strict Pragma On
  
  First you use :
  
    use strict;
  
  Then you use:
  
   use vars qw($scalar %hash @array);
  
  This declares the named variables as package globals in the current
  package.  They may be referred to within the same file and package
  with their unqualified names; and in different files/packages with
  their fully qualified names. 
  
  With perl5.6 you can use the C<our> operator instead:
  
    our($scalar, %hash, @array);
  
  If you want to share package global variables between packages, here
  is what you can do.
  
  =head2 Using Exporter.pm to Share Global Variables
  
  Assume that you want to share the C<CGI.pm> object (I will use C<$q>)
  between your modules. For example, you create it in C<script.pl>, but
  you want it to be visible in C<My::HTML>. First, you make C<$q>
  global.
  
    script.pl:
    ----------------
    use vars qw($q);
    use CGI;
    use lib qw(.); 
    use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
    $q = CGI->new;
    
    My::HTML::printmyheader();
  
  Note that we have imported C<$q> from C<My::HTML>. And C<My::HTML>
  does the export of C<$q>:
  
    My/HTML.pm
    ----------------
    package My::HTML;
    use strict;
    
    BEGIN {
      use Exporter ();
    
      @My::HTML::ISA         = qw(Exporter);
      @My::HTML::EXPORT      = qw();
      @My::HTML::EXPORT_OK   = qw($q);
    
    }
    
    use vars qw($q);
    
    sub printmyheader{
      # Whatever you want to do with $q... e.g.
      print $q->header();
    }
    1;
  
  So the C<$q> is shared between the C<My::HTML> package and
  C<script.pl>. It will work vice versa as well, if you create the
  object in C<My::HTML> but use it in C<script.pl>. You have true
  sharing, since if you change C<$q> in C<script.pl>, it will be changed
  in C<My::HTML> as well.
  
  What if you need to share C<$q> between more than two packages? For
  example you want My::Doc to share C<$q> as well.
  
  You leave C<My::HTML> untouched, and modify I<script.pl> to include:
  
   use My::Doc qw($q);
  
  Then you add the same C<Exporter> code that we used in C<My::HTML>,
  into C<My::Doc>, so that it also exports C<$q>.
  
  One possible pitfall is when you want to use C<My::Doc> in both
  C<My::HTML> and I<script.pl>. Only if you add
  
    use My::Doc qw($q);
  
  into C<My::HTML> will C<$q> be shared. Otherwise C<My::Doc> will not
  share C<$q> any more. To make things clear here is the code:
  
    script.pl:
    ----------------
    use vars qw($q);
    use CGI;
    use lib qw(.); 
    use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
    use My::Doc  qw($q); # Ditto
    $q = new CGI;
    
    My::HTML::printmyheader();
  
    My/HTML.pm
    ----------------
    package My::HTML;
    use strict;
    
    BEGIN {
      use Exporter ();
    
      @My::HTML::ISA         = qw(Exporter);
      @My::HTML::EXPORT      = qw();
      @My::HTML::EXPORT_OK   = qw($q);
    
    }
    
    use vars     qw($q);
    use My::Doc  qw($q);
    
    sub printmyheader{
      # Whatever you want to do with $q... e.g.
      print $q->header();
    
      My::Doc::printtitle('Guide');
    }
    1;
  
    My/Doc.pm
    ----------------
    package My::Doc;
    use strict;
    
    BEGIN {
      use Exporter ();
    
      @My::Doc::ISA         = qw(Exporter);
      @My::Doc::EXPORT      = qw();
      @My::Doc::EXPORT_OK   = qw($q);
    
    }
    
    use vars qw($q);
    
    sub printtitle{
      my $title = shift || 'None';
      
      print $q->h1($title);
    }
    1;
  
  =head2 Using the Perl Aliasing Feature to Share Global Variables
  
  As the title says you can import a variable into a script or module
  without using C<Exporter.pm>. I have found it useful to keep all the
  configuration variables in one module C<My::Config>. But then I have
  to export all the variables in order to use them in other modules,
  which is bad for two reasons: polluting other packages' name spaces
  with extra tags which increases the memory requirements; and adding
  the overhead of keeping track of what variables should be exported
  from the configuration module and what imported, for some particular
  package.  I solve this problem by keeping all the variables in one
  hash C<%c> and exporting that. Here is an example of C<My::Config>:
  
    package My::Config;
    use strict;
    use vars qw(%c);
    %c = (
      # All the configs go here
      scalar_var => 5,
    
      array_var  => [qw(foo bar)],
    
      hash_var   => {
                     foo => 'Foo',
                     bar => 'BARRR',
                    },
    );
    1;
  
  Now in packages that want to use the configuration variables I have
  either to use the fully qualified names like C<$My::Config::test>,
  which I dislike or import them as described in the previous section.
  But hey, since we have only one variable to handle, we can make things
  even simpler and save the loading of the C<Exporter.pm> package. We
  will use the Perl aliasing feature for exporting and saving the
  keystrokes:
  
    package My::HTML;
    use strict;
    use lib qw(.);
      # Global Configuration now aliased to global %c
    use My::Config (); # My/Config.pm in the same dir as script.pl
    use vars qw(%c);
    *c = \%My::Config::c;
    
      # Now you can access the variables from the My::Config
    print $c{scalar_var};
    print $c{array_var}[0];
    print $c{hash_var}{foo};
  
  Of course $c is global everywhere you use it as described above, and
  if you change it somewhere it will affect any other packages you have
  aliased C<$My::Config::c> to.
  
  Note that aliases work either with global or C<local()> vars - you
  cannot write:
  
    my *c = \%My::Config::c; # ERROR!
  
  Which is an error. But you can write:
  
    local *c = \%My::Config::c;
  
  For more information about aliasing, refer to the Camel book, second
  edition, pages 51-52.
  
  =head2 Using Non-Hardcoded Configuration Module Names
  
  You have just seen how to use a configuration module for configuration
  centralization and an easy access to the information stored in this
  module. However, there is somewhat of a chicken-and-egg problem--how
  to let your other modules know the name of this file? Hardcoding the
  name is brittle--if you have only a single project it should be fine,
  but if you have more projects which use different configurations and
  you will want to reuse their code you will have to find all instances
  of the hardcoded name and replace it. 
  
  Another solution could be to have the same name for a configuration
  module, like C<My::Config> but putting a different copy of it into
  different locations. But this won't work under mod_perl because of the
  namespace collision. You cannot load different modules which uses the
  same name, only the first one will be loaded.
  
  Luckily, there is another solution which allows us to stay flexible.
  C<PerlSetVar> comes to rescue. Just like with environment variables,
  you can set server's global Perl variables which can be retrieved from
  any module and script. Those statements are placed into the
  I<httpd.conf> file. For example
  
    PerlSetVar FooBaseDir       /home/httpd/foo
    PerlSetVar FooConfigModule  Foo::Config
  
  Now we require() the file where the above configuration will be used.
  
    PerlRequire /home/httpd/perl/startup.pl
  
  In the I<startup.pl> we might have the following code:
  
    # retrieve the configuration module path
    use Apache;
    my $s             = Apache->server;
    my $base_dir      = $s->dir_config('FooBaseDir')      || '';
    my $config_module = $s->dir_config('FooConfigModule') || '';
    die "FooBaseDir and FooConfigModule aren't set in httpd.conf" 
        unless $base_dir and $config_module;
    
    # build the real path to the config module
    my $path = "$base_dir/$config_module";
    $path =~ s|::|/|;
    $path .= ".pm";
    # we have something like "/home/httpd/foo/Foo/Config.pm"
    
    # now we can pull in the configuration module
    require $path;
  
  Now we know the module name and it's loaded, so for example if we need
  to use some variables stored in this module to open a database
  connection, we will do:
  
    Apache::DBI->connect_on_init
    ("DBI:mysql:${$config_module.'::DB_NAME'}::${$config_module.'::SERVER'}",
     ${$config_module.'::USER'},
     ${$config_module.'::USER_PASSWD'},
     {
      PrintError => 1, # warn() on errors
      RaiseError => 0, # don't die on error
      AutoCommit => 1, # commit executes immediately
     }
    );
  
  Where variable like:
  
    ${$config_module.'::USER'}
  
  In our example are really:
  
    $Foo::Config::USER
  
  If you want to access these variable from within your code at the run
  time, instead accessing to the server object C<$c>, use the request
  object C<$r>:
  
    my $r = shift;
    my $base_dir      = $r->dir_config('FooBaseDir')      || '';
    my $config_module = $r->dir_config('FooConfigModule') || '';
    
  
  =head1 The Scope of the Special Perl Variables
  
  Special Perl variables like C<$|> (buffering), C<$^T> (script's start
  time), C<$^W> (warnings mode), C<$/> (input record separator), C<$\>
  (output record separator) and many more are all true global variables;
  they do not belong to any particular package (not even C<main::>) and
  are universally available. This means that if you change them, you
  change them anywhere across the entire program; furthermore you cannot
  scope them with my(). However you can local()ise them which means that
  any changes you apply will only last until the end of the enclosing
  scope. In the mod_perl situation where the child server doesn't
  usually exit, if in one of your scripts you modify a global variable
  it will be changed for the rest of the process' life and will affect
  all the scripts executed by the same process. Therefore localizing
  these variables is highly recommended, I'd say mandatory.
  
  We will demonstrate the case on the input record separator
  variable. If you undefine this variable, the diamond operator
  (readline) will suck in the whole file at once if you have enough
  memory. Remembering this you should never write code like the example
  below.
  
    $/ = undef; # BAD!
    open IN, "file" ....
      # slurp it all into a variable
    $all_the_file = <IN>;
  
  The proper way is to have a local() keyword before the special
  variable is changed, like this:
  
    local $/ = undef; 
    open IN, "file" ....
      # slurp it all inside a variable
    $all_the_file = <IN>;
  
  But there is a catch. local() will propagate the changed value to 
  the code below it.  The modified value will be in effect until the
  script terminates, unless it is changed again somewhere else in the
  script.
  
  A cleaner approach is to enclose the whole of the code that is
  affected by the modified variable in a block, like this:
  
    {
      local $/ = undef; 
      open IN, "file" ....
        # slurp it all inside a variable
      $all_the_file = <IN>;
    }
  
  That way when Perl leaves the block it restores the original value of
  the C<$/> variable, and you don't need to worry elsewhere in your
  program about its value being changed here.
  
  Note that if you call a subroutine after you've set a global variable
  but within the enclosing block, the global variable will be visible
  with its new value inside the subroutine.
  
  =head1 Compiled Regular Expressions 
  
  When using a regular expression that contains an interpolated Perl
  variable, if it is known that the variable (or variables) will not
  change during the execution of the program, a standard optimization
  technique is to add the C</o> modifier to the regex pattern.  This
  directs the compiler to build the internal table once, for the entire
  lifetime of the script, rather than every time the pattern is
  executed. Consider:
  
    my $pat = '^foo$'; # likely to be input from an HTML form field
    foreach( @list ) {
      print if /$pat/o;
    }
  
  This is usually a big win in loops over lists, or when using the
  C<grep()> or C<map()> operators.
  
  In long-lived mod_perl scripts, however, the variable may change with
  each invocation and this can pose a problem. The first invocation of a
  fresh httpd child will compile the regex and perform the search
  correctly. However, all subsequent uses by that child will continue to
  match the original pattern, regardless of the current contents of the
  Perl variables the pattern is supposed to depend on. Your script will
  appear to be broken.
  
  There are two solutions to this problem:
  
  The first is to use C<eval q//>, to force the code to be evaluated
  each time. Just make sure that the eval block covers the entire loop
  of processing, and not just the pattern match itself.
  
  The above code fragment would be rewritten as: 
  
    my $pat = '^foo$';
    eval q{
      foreach( @list ) {
        print if /$pat/o;
      }
    }
  
  Just saying:
  
    foreach( @list ) {
      eval q{ print if /$pat/o; };
    }
  
  means that we recompile the regex for every element in the list even
  though the regex doesn't change.
  
  You can use this approach if you require more than one pattern match
  operator in a given section of code. If the section contains only one
  operator (be it an C<m//> or C<s///>), you can rely on the property of the
  null pattern, that reuses the last pattern seen. This leads to the
  second solution, which also eliminates the use of eval.
  
  The above code fragment becomes: 
  
    my $pat = '^foo$';
    "something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
    foreach( @list ) {
      print if //;
    }
  
  The only gotcha is that the dummy match that boots the regular
  expression engine must absolutely, positively succeed, otherwise the
  pattern will not be cached, and the C<//> will match everything. If you
  can't count on fixed text to ensure the match succeeds, you have two
  possibilities.
  
  If you can guarantee that the pattern variable contains no
  meta-characters (things like *, +, ^, $...), you can use the dummy
  match:
  
    $pat =~ /\Q$pat\E/; # guaranteed if no meta-characters present
  
  If there is a possibility that the pattern can contain
  meta-characters, you should search for the pattern or the non-searchable
  \377 character as follows:
  
    "\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present
  
  Another approach:
  
  It depends on the complexity of the regex to which you apply this
  technique.  One common usage where a compiled regex is usually more
  efficient is to "I<match any one of a group of patterns>" over and
  over again.
  
  Maybe with a helper routine, it's easier to remember.  Here is one
  slightly modified from Jeffery Friedl's example in his book
  "I<Mastering Regular Expressions>".
  
    #####################################################
    # Build_MatchMany_Function
    # -- Input:  list of patterns
    # -- Output: A code ref which matches its $_[0]
    #            against ANY of the patterns given in the
    #            "Input", efficiently.
    #
    sub Build_MatchMany_Function {
      my @R = @_;
      my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
      my $matchsub = eval "sub { $expr }";
      die "Failed in building regex @R: $@" if $@;
      $matchsub;
    }
  
  Example usage:
  
    @some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
    $Known_Browser=Build_MatchMany_Function(@some_browsers);
    
    while (<ACCESS_LOG>) {
      # ...
      $browser = get_browser_field($_);
      if ( ! &$Known_Browser($browser) ) {
        print STDERR "Unknown Browser: $browser\n";
      }
      # ...
    }
  
  And of course you can use the qr() operator which makes the code even
  more efficient:
  
    my $pat = '^foo$';
    my $re  = qr($pat);
    foreach( @list ) {
        print if /$re/o;
      }
  
  The qr() operator compiles the pattern for each request and then use
  the compiled version in the actual match.
  
  =head1 Exception Handling for mod_perl
  
  Here are some guidelines for S<clean(er)> exception handling in
  mod_perl, although the technique presented can be applied to all of
  your Perl programming.
  
  The reasoning behind this document is the current broken status of
  C<$SIG{__DIE__}> in the perl core - see both the perl5-porters and the
  mod_perl mailing list archives for details on this discussion. (It's
  broken in at least Perl v5.6.0 and probably in later versions as
  well). In short summary, $SIG{__DIE__} is a little bit too global, and
  catches exceptions even when you want to catch them yourself, using
  an C<eval{}> block.
  
  =head2 Trapping Exceptions in Perl
  
  To trap an exception in Perl we use the C<eval{}> construct. Many
  people initially make the mistake that this is the same as the C<eval
  EXPR> construct, which compiles and executes code at run time, but
  that's not the case. C<eval{}> compiles at compile time, just like the
  rest of your code, and has next to zero run-time penalty. For the
  hardcore C programmers among you, it uses the C<setjmp/longjmp> POSIX
  routines internally, just like C++ exceptions.
  
  When in an eval block, if the code being executed die()'s for any
  reason, an exception is thrown. This exception can be caught by
  examining the C<$@> variable immediately after the eval block; if
  C<$@> is true then an exception occurred and C<$@> contains the
  exception in the form of a string.  The full construct looks like
  this:
  
    eval {
        # Some code here
    }; # Note important semi-colon there
    if ($@) # $@ contains the exception that was thrown
    {
        # Do something with the exception
    }
    else # optional
    {
        # No exception was thrown
    }
  
  Most of the time when you see these exception handlers there is no
  else block, because it tends to be OK if the code didn't throw an
  exception.
  
  Perl's exception handling is similar to that of other languages, though it may
  not seem so at first sight:
  
    Perl                             Other language
    -------------------------------  ------------------------------------
    eval {                           try {
      # execute here                   // execute here
      # raise our own exception:       // raise our own exception:
      die "Oops" if /error/;           if(error==1){throw Exception.Oops;}
      # execute more                   // execute more
    } ;                              }
    if($@) {                         catch {
      # handle exceptions              switch( Exception.id ) {
      if( $@ =~ /Fail/ ) {               Fail : fprintf( stderr, "Failed\n" ) ;
          print "Failed\n" ;                    break ;
      }
      elsif( $@ =~ /Oops/ ) {            Oops : throw Exception ;
          # Pass it up the chain                 
          die if $@ =~ /Oops/;
      }
      else {                             default :
          # handle all other           }
          # exceptions here          }
      }                              // If we got here all is OK or handled
    }
    else { # optional
      # all is well
    }
    # all is well or has been handled
  
  =head2 Alternative Exception Handling Techniques
  
  An often suggested method for handling global exceptions in mod_perl,
  and other perl programs in general, is a B<__DIE__> handler, which can
  be set up by either assigning a function name as a string to
  C<$SIG{__DIE__}> (not particularly recommended, because of the
  possible namespace clashes) or assigning a code reference to
  C<$SIG{__DIE__}>. The usual way of doing so is to use an anonymous
  subroutine:
  
    $SIG{__DIE__} = sub { print "Eek - we died with:\n", $_[0]; };
  
  The current problem with this is that C<$SIG{__DIE__}> is a global
  setting in your script, so while you can potentially hide away your
  exceptions in some external module, the execution of C<$SIG{__DIE__}>
  is fairly magical, and interferes not just with your code, but with
  all code in every module you import. Beyond the magic involved,
  C<$SIG{__DIE__}> actually interferes with perl's normal exception
  handling mechanism, the C<eval{}> construct. Witness:
  
    $SIG{__DIE__} = sub { print "handler\n"; };
    
    eval {
        print "In eval\n";
        die "Failed for some reason\n";
    };
    if ($@) {
        print "Caught exception: $@";
    }
  
  The code unfortunately prints out:
  
    In eval
    handler
  
  Which isn't quite what you would expect, especially if that
  C<$SIG{__DIE__}> handler is hidden away deep in some other module that
  you didn't know about. There are work arounds however. One is to
  localize C<$SIG{__DIE__}> in every exception trap you write:
  
    eval {
        local $SIG{__DIE__};
        ...
    };
  
  Obviously this just doesn't scale - you don't want to be doing that
  for every exception trap in your code, and it's a slow down. A second
  work around is to check in your handler if you are trying to catch
  this exception:
  
    $SIG{__DIE__} = sub {
        die $_[0] if $^S;
        print "handler\n";
    };
  
  However this won't work under C<Apache::Registry> - you're always in
  an eval block there!
  
  The other problem with C<$SIG{__DIE__}> also relates to its global nature.
  Because you might have more than one application running under mod_perl,
  you can't be sure which has set a C<$SIG{__DIE__}> handler when and for
  what. This can become extremely confusing when you start scaling up
  from a set of simple registry scripts that might rely on CGI::Carp for
  global exception handling (which uses C<$SIG{__DIE__}> to trap exceptions)
  to having many applications installed with a variety of exception
  handling mechanisms in place.
  
  You should warn people about this danger of C<$SIG{__DIE__}> and
  inform them of better ways to code. The following material is an
  attempt to do just that.
  
  =head2 Better Exception Handling
  
  The C<eval{}> construct in itself is a fairly weak way to handle
  exceptions as strings. There's no way to pass more information in your
  exception, so you have to handle your exception in more than one place
  - at the location the error occurred, in order to construct a sensible
  error message, and again in your exception handler to de-construct
  that string into something meaningful (unless of course all you want
  your exception handler to do is dump the error to the browser). The
  other problem is that you have no way of automatically detecting where
  the exception occurred using C<eval{}> construct. In a C<$SIG{__DIE__}>
  block you always have the use of the caller() function to detect where
  the error occurred. But we can fix that...
  
  A little known fact about exceptions in perl 5.005 is that you can
  call die with an object. The exception handler receives that object in
  C<$@>. This is how you are advised to handle exceptions now, as it
  provides an extremely flexible and scalable exceptions solution, potentially
  providing almost all of the power Java exceptions.
  
  [As a footnote here, the only thing that is really missing here from
  Java exceptions is a guaranteed Finally clause, although its possible
  to get about 98.62% of the way towards providing that using C<eval{}>.]
  
  =head3 A Little Housekeeping
  
  First though, before we delve into the details, a little housekeeping
  is in order. Most, if not all, mod_perl programs consist of a main
  routine that is entered, and then dispatches itself to a routine
  depending on the parameters passed and/or the form values. In a normal
  C program this is your main() function, in a mod_perl handler this is
  your handler() function/method. The exception to this rule seems to be
  Apache::Registry scripts, although the techniques described here can
  be easily adapted.
  
  In order for you to be able to use exception handling to its best
  advantage you need to change your script to have some sort of global
  exception handling. This is much more trivial than it sounds. If
  you're using C<Apache::Registry> to emulate CGI you might consider
  wrapping your entire script in one big eval block, but I would
  discourage that. A better method would be to modularize your script
  into discrete function calls, one of which should be a dispatch
  routine:
  
    #!/usr/bin/perl -w
    # Apache::Registry script
    
    eval {
       dispatch();
    };
    if ($@) {
       # handle exception
    }
    
    sub dispatch {
        ...
    }
    
  This is easier with an ordinary mod_perl handler as it is natural to
  have separate functions, rather than a long run-on script:
  
    MyHandler.pm
    ------------
    sub handler {
        my $r = shift;
        
        eval {
           dispatch($r);
        };
        if ($@) {
           # handle exception
        }
    }
    
    sub dispatch {
        my $r = shift;
        ...
    }
    
  Now that the skeleton code is setup, let's create an exception class,
  making use of Perl 5.005's ability to throw exception objects.
  
  =head3 An Exception Class
  
  This is a really simple exception class, that does nothing but contain
  information. A better implementation would probably also handle its
  own exception conditions, but that would be more complex, requiring
  separate packages for each exception type.
  
    My/Exception.pm
    ---------------
    package My::Exception;
    
    sub AUTOLOAD {
        no strict 'refs', 'subs';
        if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) {
            my $exception = $1;
            *{$AUTOLOAD} = 
                sub {
                    shift;
                    my ($package, $filename, $line) = caller;
                    push @_, caller => {
                                    package => $package,
                                    filename => $filename,
                                    line => $line,
                                      };
                    bless { @_ }, "My::Exception::$exception"; 
                };
            goto &{$AUTOLOAD};
        }
        else {
            die "No such exception class: $AUTOLOAD\n";
        }
    }
    
    1;
  
  OK, so this is all highly magical, but what does it do? It creates a
  simple package that we can import and use as follows:
  
    use My::Exception;
    
    die My::Exception->SomeException( foo => "bar" );
  
  The exception class tracks exactly where we died from using the
  caller() mechanism, it also caches exception classes so that
  C<AUTOLOAD> is only called the first time (in a given process) an
  exception of a particular type is thrown (particularly relevant under
  mod_perl).
  
  =head2 Catching Uncaught Exceptions
  
  What about exceptions that are thrown outside of your control? We can
  fix this using one of two possible methods. The first is to override
  die globally using the old magical C<$SIG{__DIE__}>, and the second,
  is the cleaner non-magical method of overriding the global die()
  method to your own die() method that throws an exception that makes
  sense to your application.
  
  =head3 Using $SIG{__DIE__}
  
  Overloading using C<$SIG{__DIE__}> in this case is rather simple,
  here's some code:
  
    $SIG{__DIE__} = sub {
        if(!ref($_[0])) {
            $err = My::Exception->UnCaught(text => join('', @_));
        }
        die $err;
    };
  
  All this does is catch your exception and re-throw it. It's not as
  dangerous as we stated earlier that C<$SIG{__DIE__}> can be, because
  we're actually re-throwing the exception, rather than catching it and
  stopping there. Even though $SIG{__DIE__} is a global handler, because
  we are simply re-throwing the exception we can let other applications
  outside of our control simply catch the exception and not worry about
  it.
  
  There's only one slight buggette left, and that's if some external
  code die()'ing catches the exception and tries to do string
  comparisons on the exception, as in:
  
    eval {
        ... # some code
        die "FATAL ERROR!\n";
    };
    if ($@) {
        if ($@ =~ /^FATAL ERROR/) {
            die $@;
        }
    }
  
  In order to deal with this, we can overload stringification for our
  C<My::Exception::UnCaught> class:
  
    {
        package My::Exception::UnCaught;
        use overload '""' => \&str;
    
        sub str {
            shift->{text};
        }
    }
  
  We can now let other code happily continue. Note that there is a bug in
  Perl 5.6 which may affect people here: Stringification does not occur
  when an object is operated on by a regular expression (via the =~ operator).
  A work around is to explicitly stringify using qq double quotes, however
  that doesn't help the poor soul who is using other applications. This bug
  has been fixed in later versions of Perl.
  
  =head3 Overriding the Core die() Function
  
  So what if we don't want to touch C<$SIG{__DIE__}> at all? We can
  overcome this by overriding the core die function. This is slightly
  more complex than implementing a C<$SIG{__DIE__}> handler, but is far
  less magical, and is the right thing to do, according to the
  L<perl5-porters mailing list|guide::help/Get_help_with_Perl>.
  
  Overriding core functions has to be done from an external
  package/module. So we're going to add that to our C<My::Exception>
  module. Here's the relevant parts:
  
    use vars qw/@ISA @EXPORT/;
    use Exporter;
    
    @EXPORT = qw/die/;
    @ISA = 'Exporter';
    
    sub die (@); # prototype to match CORE::die
    
    sub import {
        my $pkg = shift;
        $pkg->export('CORE::GLOBAL', 'die');
        Exporter::import($pkg,@_);
    }
    
    sub die (@) {
        if (!ref($_[0])) {
            CORE::die My::Exception->UnCaught(text => join('', @_));
        }
        CORE::die $_[0]; # only use first element because its an object
    }
  
  That wasn't so bad, was it? We're relying on Exporter's export()
  function to do the hard work for us, exporting the die() function into
  the C<CORE::GLOBAL> namespace. If we don't want to overload die() everywhere
  this can still be an extremely useful technique. By just using Exporter's
  default import() method we can export our new die() method into any package
  of our choosing. This allows us to short-cut the long calling convention
  and simply die() with a string, and let the system handle the actual 
  construction into an object for us.
  
  Along with the above overloaded stringification, we now have a complete
  exception system (well, mostly complete. Exception die-hards would argue that
  there's no "finally" clause, and no exception stack, but that's another topic
  for another time).
  
  =head2 A Single UnCaught Exception Class
  
  Until the Perl core gets its own base exception class (which will likely 
happen
  for Perl 6, but not sooner), it is vitally important that you decide upon a
  single base exception class for all of the applications that you install on
  your server, and a single exception handling technique. The problem comes when
  you have multiple applications all doing exception handling and all expecting 
a
  certain type of "UnCaught" exception class. Witness the following application:
  
    package Foo;
    
    eval {
       # do something
    }
    if ($@) {
       if ([EMAIL PROTECTED]>isa('Foo::Exception::Bar')) {
          # handle "Bar" exception
       }
       elsif ([EMAIL PROTECTED]>isa('Foo::Exception::UnCaught')) {
          # handle uncaught exceptions
       }
    }
  
  All will work well until someone installs application "TrapMe" on the
  same machine, which installs its own UnCaught exception handler, 
  overloading CORE::GLOBAL::die or installing a $SIG{__DIE__} handler.
  This is actually a case where using $SIG{__DIE__} might actually be
  preferable, because you can change your handler() routine to look like
  this:
  
    sub handler {
        my $r = shift;
        
        local $SIG{__DIE__};
        Foo::Exception->Init(); # sets $SIG{__DIE__}
        
        eval {
           dispatch($r);
        };
        if ($@) {
           # handle exception
        }
    }
    
    sub dispatch {
        my $r = shift;
        ...
    }
  
  In this case the very nature of $SIG{__DIE__} being a lexical variable
  has helped us, something we couldn't fix with overloading 
  CORE::GLOBAL::die. However there is still a gotcha. If someone has
  overloaded die() in one of the applications installed on your mod_perl
  machine, you get the same problems still. So in short: Watch out, and
  check the source code of anything you install to make sure it follows
  your exception handling technique, or just uses die() with strings.
  
  =head2 Some Uses
  
  I'm going to come right out and say now: I abuse this system horribly!
  I throw exceptions all over my code, not because I've hit an
  "exceptional" bit of code, but because I want to get straight back out
  of the current call stack, without having to have every single level of
  function call check error codes. One way I use this is to return
  Apache return codes:
  
    # paranoid security check
    die My::Exception->RetCode(code => 204);
  
  Returns a 204 error code (C<HTTP_NO_CONTENT>), which is caught at my
  top level exception handler:
  
    if ([EMAIL PROTECTED]>isa('My::Exception::RetCode')) {
        return [EMAIL PROTECTED]>{code};
    }
  
  That last return statement is in my handler() method, so that's the
  return code that Apache actually sends. I have other exception
  handlers in place for sending Basic Authentication headers and
  Redirect headers out. I also have a generic C<My::Exception::OK>
  class, which gives me a way to back out completely from where I am,
  but register that as an OK thing to do.
  
  Why do I go to these extents? After all, code like slashcode (the code
  behind http://slashdot.org) doesn't need this sort of thing, so why
  should my web site? Well it's just a matter of scalability and
  programmer style really. There's a lot of literature out there about
  exception handling, so I suggest doing some research.
  
  =head2 Conclusions
  
  Here I've demonstrated a simple and scalable (and useful) exception
  handling mechanism, that fits perfectly with your current code, and
  provides the programmer with an excellent means to determine what has
  happened in his code. Some users might be worried about the overhead
  of such code. However in use I've found accessing the database to be a
  much more significant overhead, and this is used in some code
  delivering to thousands of users.
  
  For similar exception handling techniques, see the section "L<Other
  Implementations|guide::perl/Other_Implementations>".
  
  =head2 The My::Exception class in its entirety
  
    package My::Exception
    
    use vars qw/@ISA @EXPORT $AUTOLOAD/;
    use Exporter;
    @ISA = 'Exporter';
    @EXPORT = qw/die/;
    
    sub die (@);
    
    sub import {
        my $pkg = shift;
        # allow "use My::Exception 'die';" to mean import locally only
        $pkg->export('CORE::GLOBAL', 'die') unless @_;
        Exporter::import($pkg,@_);
    }
      
    sub die (@) {
        if (!ref($_[0])) {
            CORE::die My::Exception->UnCaught(text => join('', @_));
        }
        CORE::die $_[0];
    }
    
    {
        package My::Exception::UnCaught;
        use overload '""' => sub { shift->{text} } ; 
    }
    
    sub AUTOLOAD {
        no strict 'refs', 'subs';
        if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) {
            my $exception = $1;
            *{$AUTOLOAD} = 
                sub {
                    shift;
                    my ($package, $filename, $line) = caller;
                    push @_, caller => {
                                    package => $package,
                                    filename => $filename,
                                    line => $line,
                                        };
                    bless { @_ }, "My::Exception::$exception"; 
                };
            goto &{$AUTOLOAD};
        }
        else {
            die "No such exception class: $AUTOLOAD\n";
        }
    }
    
    1;
  
  =head2 Other Implementations
  
  Some users might find it very useful to have the more C++/Java like
  interface of try/catch functions. These are available in several forms
  that all work in slightly different ways. See the documentation for
  each module for details:
  
  =over 
  
  =item * Error.pm
  
  Graham Barr's excellent OO styled "try, throw, catch" module (from
  L<CPAN|guide::download/Perl>). This should be considered your best option
  for structured exception handling because it is well known and well
  supported and used by a lot of other applications.
  
  =item * Exception::Class and Devel::StackTrace
  
  by Dave Rolsky both available from CPAN of course.
  
  C<Exception::Class> is a bit cleaner than the C<AUTOLOAD> method from
  above as it can catch typos in exception class names, whereas the
  method above will automatically create a new class for you.  In
  addition, it lets you create actual class hierarchies for your
  exceptions, which can be useful if you want to create exception
  classes that provide extra methods or data.  For example, an exception
  class for database errors could provide a method for returning the SQL
  and bound parameters in use at the time of the error.
  
  =item * Try.pm
  
  Tony Olekshy's. Adds an unwind stack and some other interesting
  features.  Not on the CPAN. Available at
  http://www.avrasoft.com/perl/rfc/try-1136.zip
  
  =back
  
  =head1 Maintainers
  
  Maintainer is the person(s) you should contact with updates,
  corrections and patches.
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =back
  
  
  =head1 Authors
  
  =over
  
  =item *
  
  Stas Bekman E<lt>stas (at) stason.orgE<gt>
  
  =item *
  
  Matt Sergeant E<lt>matt (at) sergeant.orgE<gt>
  
  =back
  
  Only the major authors are listed above. For contributors see the
  Changes file.
  
  
  =cut


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

cvs commit: modperl-docs/src/docs/general .cvsignore advocacy.pod control.pod hardware.pod multiuser.pod perl_myth.pod perl_reference.pod config.cfg cvs_howto.pod

Reply via email to