intro what_is_new.pod

stas 5 Dec 2002 08:55:51 -0000

stas        2002/12/05 00:55:49

  Added:       src/docs/2.0/user/intro what_is_new.pod
  Log:
  contribute a chapter from my tutorial: "what's new under the sun", which
  supersedes mostly the overview chapter that we have
  
  Revision  Changes    Path
  1.1                  modperl-docs/src/docs/2.0/user/intro/what_is_new.pod
  
  Index: what_is_new.pod
  ===================================================================
  =head1 NAME
  
  What's new under the sun?
  
  =head1 Description
  
  This chapter presents the new features of Apache 2.0, Perl 5.6.0 -
  5.8.0 and their influence on mod_perl 2.0.
  
  =head1 Why mod_perl, the Next Generation
  
  Here and in the rest of this document we refer to mod_perl 1.x series
  as mod_perl 1.0 and, 2.0.x as mod_perl 2.0 to keep things
  simple. Similarly we call Apache 1.3.x series as Apache 1.3 and 2.0.x
  as Apache 2.0
  
  Since Doug MacEachern has introduced mod_perl 1.0 in 1996, he had to
  adjust source code to the many changes Apache and Perl went through,
  while staying compatible with the older versions, leading to a very
  complex source code, with hundreds of C<#ifdefs> and workarounds for
  various incompatibilities in older Perl and Apache versions. When
  Apache 2.0 development was underway, the new threads design was
  introduced, which couldn't be supported by the existing Perl version,
  since it required thread-safe Perl interpreters.
  
  Think of it as a conspiracy or just a lucky coincidence, on March 10,
  2002, the first Apache 2.0 alpha version was released. 13 days later,
  on March 23, 2002, Perl 5.6.0 has been released. And guess what, Perl
  5.6.0 was the first Perl version to support the internal
  thread-safeness across multiple interpreters.
  
  Since Perl 5.6.0 and Apache 2.0 were the very minimum requirements
  there was no need to support older version and it was a great idea to
  start mod_perl 2.0 code base from scratch, incorporating the lessons
  learned during the 5 years of mod_perl's existence.
  
  The new version includes a mechanism for an automatic building of the
  Perl interface to Apache API, which allowed us to easily adjust
  mod_perl 2.0 to ever changing Apache 2.0 API, during its development
  period. Another important feature is the C<Apache::Test> framework,
  which was originally developed for mod_perl 2.0, but then was adopted
  by Apache 2.0 developers to test the core server features and third
  party modules. Moreover the tests written using the C<Apache::Test>
  framework could be run with Apache 1.0 and 2.0, assuming that both
  supported the same features.
  
  There are multiple other interesting changes that have already
  happened to mod_perl in version 2.0 and more will be developed in the
  future. Some of these will be covered in this document and some you
  will discover on your own while reading mod_perl documentation.
  
  =head1 What's new in Apache 2.0
  
  Apache 2.0 has introduces numerous new features and enhancements. Here
  are the most important new features:
  
  =over
  
  =item * I<Apache Portable Runtime> (APR)
  
  The APR presents a standard API for server applications, covering file
  I/O, logging, shared memory, threads, managing child processes and
  many other functionalities needed for developing the Apache core and
  third party modules in a portable and effective way. One of the
  important effects is that it significantly simplifies the code that
  uses the APR making it much easier to review and understand the Apache
  code, increasing the number of revealed bugs and contributed patches.
  
  The APR uses the concept of memory pools, which significantly
  simplifies the memory management code and reduces the possibility of
  having memory leaks, which always haunt C programmers.
  
  =item * I/O Filtering
  
  Apache 2.0 allows multiple modules to filter both the request and the
  response. Now one module can pipe its output as an input to another
  module as if another module was receiving the data directly from the
  TCP stream. The same mechanism works with the generated response.
  
  With I/O filtering in place, things like SSL, data (de-)compression
  and other manipulations are done very easily.
  
  The I/O filtering is based on the concept of bucket brigades and
  implemented in the APR.
  
  
  =item * I<Multi Processing Model modules> (MPMs).
  
  In the previous Apache generation the same code base was trying to
  handle a management of incoming requests for different platforms,
  which lead to scalability problems on certain platforms, mainly on
  those which are different from Unix. This also lead to an undesired
  complexity of the code.
  
  Apache 2.0 introduces the concept of Multi Processing Model modules,
  whose main responsibility is to map the incoming requests to either
  threads, processes or a threads/processes hybrid. Now it's possible
  to write different processing modules specific to various platforms.
  For example the Apache 2.0 on Windows is much more efficient now,
  since it uses I<mpm_winnt> which deploys the native Windows features.
  
  Here is a partial list of major MPMs available as of this writing.
  
  =over
  
  =item prefork
  
  The I<prefork> MPM emulates Apache 1.3's preforking model, where each
  request is handled by a different forked child process.
  
  =item worker
  
  The I<worker> MPM implements a hybrid multi-process multi-threaded
  approach based on the I<pthreads> standard. It uses one acceptor
  thread, multiple worker threads.
  
  =item mpmt_os2, netware, winnt and beos
  
  These MPMs also implement the hybrid multi-process/multi-threaded
  model, with each based on native OS thread implementations.
  
  =back
  
  On platforms that support more than one MPM, it's possible to switch
  the used MPMs as the need change. For example on Unix it's possible to
  start with a preforked module. Then when the demand is growing and the
  code matures, it's possible to migrate to a more efficient threaded
  MPM, assuming that the code base is capable of running in the threaded
  environment.
  
  =item * New Hook Scheme
  
  In Apache 2.0 it's possible to dynamically register functions for each
  Apache hook, and allows more than one function to be registered per
  hook. Moreover when adding new functions, it's possible to specify
  where the new function should be added, e.g. a function can be pushed
  between two already registered functions or in front of them.
  
  =item * Protocol Modules
  
  The previous Apache generation could speak only the HTTP
  protocol. Apache 2.0 has introduced a "server framework" architecture
  making it possible to plug in handlers for protocols other than HTTP.
  The protocol module design also abstracts the transport layer so
  protocols such as SSL can be hooked into the server without requiring
  modifications to the Apache source code.  This allows Apache to be
  extended much further than in the past, making it possible to add
  support for protocols such as FTP, SMTP, RPC flavors and the like.
  The main advantage being that protocol plugins can take advantage of
  Apache's portability, process/thread management, configuration
  mechanism and plugin API.
  
  =item * Parsed Configuration Tree
  
  Apache 2.0 makes the parsed configuration tree available at run time,
  so modules needing to read the configuration data (e.g., mod_info)
  don't have to re-parse the configuration file, but can re-use the
  parsed tree.
  
  =back
  
  All these new features boost the Apache performance, scalability and
  flexibility. The APR helps the overall performance by doing lots of
  platform specific optimizations in the APR internals, and giving the
  developer the API which was already greatly optimized.
  
  Apache 2.0 now includes special modules that can boost
  performance. For example the mod_mmap_static module loads webpages
  into the virtual memory and serves them directly avoiding the overhead
  of I<open()> and I<read()> system calls to pull them in from the
  filesystem.
  
  The I/O layering is helping performance too, since now modules don't
  need to waste memory and CPU cycles to manually store the data in
  shared memory or I<pnotes> in order to pass the data to another
  module, e.g., in order to provide response's gzip compression.
  
  And of course a not least important impact of these features is the
  simplification and added flexibility for the core and third party
  Apache module developers.
  
  =head1 What's new in Perl 5.6.0 - 5.8.0
  
  As we have mentioned earlier Perl 5.6.0 is the minimum requirement
  for mod_perl 2.0. Though as we will see later certain new features
  work only with Perl 5.8.0 and higher.
  
  These are the important changes in the recent Perl versions that had
  an impact on mod_perl. For a complete list of changes see the
  corresponding to the used version I<perldelta> manpage.
  
  The 5.6 Perl generation has introduced the following features:
  
  =over
  
  =item *
  
  The beginnings of support for running multiple interpreters
  concurrently in different threads.  In conjunction with the
  perl_clone() API call, which can be used to selectively duplicate the
  state of any given interpreter, it is possible to compile a piece of
  code once in an interpreter, clone that interpreter one or more times,
  and run all the resulting interpreters in distinct threads. See the
  I<perlembed> and I<perl561delta> manpages.
  
  =item *
  
  The core support for declaring subroutine attributes, which is used by
  mod_perl 2.0's I<method handlers>. See the I<attributes> manpage.
  
  =item *
  
  The I<warnings> pragma, which allows to force the code to be super
  clean, via the setting:
  
    use warnings FATAL => 'all';
  
  which will abort any code that generates warnings. This pragma also
  allows a fine control over what warnings should be reported. See the
  I<perllexwarn> manpage.
  
  =item *
  
  Certain C<CORE::> functions now can be overridden via C<CORE::GLOBAL::>
  namespace. For example mod_perl now can override C<CORE::exit()> via
  C<CORE::GLOBAL::exit>. See the I<perlsub> manpage.
  
  =item *
  
  The C<XSLoader> extension as a simpler alternative to C<DynaLoader>.
  See the I<XSLoader> manpage.
  
  =item *
  
  The large file support. If you have filesystems that support "large
  files" (files larger than 2 gigabytes), you may now also be able to
  create and access them from Perl. See the I<perl561delta> manpage.
  
  =item *
  
  Multiple performance enhancements were made. See the I<perl561delta>
  manpage.
  
  =item *
  
  Numerous memory leaks were fixed. See the I<perl561delta> manpage.
  
  =item *
  
  Improved security features: more potentially unsafe operations taint
  their results for improved security. See the I<perlsec> and
  I<perl561delta> manpages.
  
  =item *
  
  Available on new platforms: GNU/Hurd, Rhapsody/Darwin, EPOC.
  
  =back
  
  Overall multiple bugs and problems very fixed in the Perl 5.6.1, so if
  you plan on running the 5.6 generation, you should run at least
  5.6.1. It is possible that when this tutorial is printed 5.6.2 will be
  out.
  
  The Perl 5.8.0 has introduced the following features:
  
  =over
  
  =item *
  
  The introduced in 5.6.0 experimental PerlIO layer has been stabilized
  and become the default IO layer in 5.8.0. Now the IO stream can be
  filtered through multiple layers. See the I<perlapio> and I<perliol>
  manpages.
  
  For example this allows mod_perl to inter-operate with the APR IO
  layer and even use the APR IO layer in Perl code. See the
  I<APR::PerlIO> manpage.
  
  Another example of using the new feature is the extension of the
  open() functionality to create anonymous temporary files via:
  
     open my $fh, "+>", undef or die $!;
  
  That is a literal undef(), not an undefined value. See the open()
  entry in the I<perlfunc> manpage.
  
  =item *
  
  More overridable via C<CORE::GLOBAL::> keywords. See the I<perlsub>
  manpage.
  
  =item * 
  
  The signal handling in Perl has been notoriously unsafe because
  signals have been able to arrive at inopportune moments leaving Perl
  in inconsistent state.  Now Perl delays signal handling until it is
  safe.
  
  =item *
  
  C<File::Temp> was added to allow a creation of temporary files and
  directories in an easy, portable, and secure way.  See the
  I<File::Temp> manpage.
  
  =item *
  
  A new command-line option, C<-t> is available.  It is the little
  brother of C<-T>: instead of dying on taint violations, lexical
  warnings are given.  This is only meant as a temporary debugging aid
  while securing the code of old legacy applications.  B<This is not a
  substitute for C<-T>.> See the I<perlrun> manpage.
  
  A new special variable C<${^TAINT}> was introduced. It indicates
  whether taint mode is enabled. See the I<perlvar> manpage.
  
  =item *
  
  Threads implementation is much improved since 5.6.
  
  =item *
  
  A much better support for Unicode.
  
  =item *
  
  Numerous bugs and memory leaks fixed. For example now you can localize
  the tied C<Apache::DBI> filehandles without leaking memory.
  
  =item *
  
  Available on new platforms: AtheOS, Mac OS Classic, Mac OS X, MinGW,
  NCR MP-RAS, NonStop-UX, NetWare and UTS. The following platforms are
  again supported: BeOS, DYNIX/ptx, POSIX-BC, VM/ESA, z/OS (OS/390).
  
  
  =back
  
  
  =head1 What's new in mod_perl 2.0
  
  The new features introduced by Apache 2.0 and Perl 5.6 and 5.8
  generations provide the base of the new mod_perl 2.0 features. In
  addition mod_perl 2.0 re-implements itself from scratch providing such
  new features as new build and testing framework. Let's look at the
  major changes since mod_perl 1.0.
  
  =head2 Threads Support
  
  In order to adapt to the Apache 2.0 threads architecture (for threaded
  MPMs), mod_perl 2.0 needs to use thread-safe Perl interpreters, also
  known as "ithreads" (Interpreter Threads). This mechanism can be
  enabled at compile time and ensures that each Perl interpreter uses
  its private C<PerlInterpreter> structure for storing its symbol
  tables, stacks and other Perl runtime mechanisms. When this separation
  is engaged any number of threads in the same process can safely
  perform concurrent callbacks into Perl.  This of course requires each
  thread to have its own C<PerlInterpreter> object, or at least that
  each instance is only accessed by one thread at any given time.
  
  The first mod_perl generation has only a single C<PerlInterpreter>,
  which is constructed by the parent process, then inherited across the
  forks to child processes.  mod_perl 2.0 has a configurable number of
  C<PerlInterpreters> and two classes of interpreters, I<parent> and
  I<clone>.  A I<parent> is like that in mod_perl 1.0, where the main
  interpreter created at startup time compiles any pre-loaded Perl code.
  A I<clone> is created from the parent using the Perl API
  I<perl_clone()> function.  At request time, I<parent> interpreters are
  only used for making more I<clones>, as the I<clones> are the
  interpreters which actually handle requests.  Care is taken by Perl to
  copy only mutable data, which means that no runtime locking is
  required and read-only data such as the syntax tree is shared from the
  I<parent>, which should reduce the overall mod_perl memory footprint.
  
  Rather than create a C<PerlInterperter> per-thread by default,
  mod_perl creates a pool of interpreters.  The pool mechanism helps cut
  down memory usage a great deal.  As already mentioned, the syntax tree
  is shared between all cloned interpreters.  If your server is serving
  more than mod_perl requests, having a smaller number of
  PerlInterpreters than the number of threads will clearly cut down on
  memory usage. Finally and perhaps the biggest win is memory re-use: as
  calls are made into Perl subroutines, memory allocations are made for
  variables when they are used for the first time.  Subsequent use of
  variables may allocate more memory, e.g. if a scalar variable needs to
  hold a longer string than it did before, or an array has new elements
  added.  As an optimization, Perl hangs onto these allocations, even
  though their values "go out of scope".  mod_perl 2.0 has a much better
  control over which PerlInterpreters are used for incoming requests.
  The interpreters are stored in two linked lists, one for available
  interpreters and another for busy ones.  When needed to handle a
  request, one interpreter is taken from the head of the available list
  and put back into the head of the same list when done.  This means if
  for example you have 10 interpreters configured to be cloned at
  startup time, but no more than 5 are ever used concurrently, those 5
  continue to reuse Perl's allocations, while the other 5 remain much
  smaller, but ready to go if the need arises.
  
  The interpreters pool mechanism has been abstracted into an API known
  as "tipool", I<Thread Item Pool>. This pool can be used to manage any
  data structure, in which you wish to have a smaller number than the
  number of configured threads. For example a replacement for
  C<Apache::DBI> based on the I<tipool> will allow to reuse database
  connections between multiple threads of the same process.
  
  =head2 Thread-environment Issues
  
  The only thing you have to worry about your code is that it's
  thread-safe and that you don't use functions that affect all threads.
  
  Perl 5.8.0 itself is a thread-safe. That means that operations like
  C<push()>, C<map()>, C<chomp()>, C<=>, C</>, C<+=>, etc. are
  thread-safe. Operations that involve system calls, may or may not be
  thread-safe. It all depends on whether the underlying C libraries used
  by the perl functions are thread-safe.
  
  For example the function C<localtime()> is not thread-safe when the
  implementation of asctime(3) is not thread-safe. Other usually
  problematic functions include readdir(), srand(), etc.
  
  Another important issue that shouldn't be missed is what some people
  refer to as I<thread-locality>. Certain functions executed in a single
  thread affect the whole process and therefore all other threads
  running inside that process. For example if you C<chdir()> in one
  thread, all other thread now see the current working directory of that
  thread that C<chdir()>'ed to that directory. Other functions with
  similar effects include C<umask()>, C<chroot()>, etc. Currently there
  is no cure for this problem. You have to find these functions in your
  code and replace them with different workarounds.
  
  
  =head2 Perl interface to the APR and Apache APIs
  
  As we have mentioned earlier, Apache 2.0 uses two APIs:
  
  =over
  
  =item *
  
  the Apache Portable APR (APR) API, which implements a portable and
  efficient API to handle generically work with files, threads,
  processes, shared memory, etc.
  
  =item *
  
  the Apache API, which handles issues specific to the web server.
  
  =back
  
  mod_perl 2.0 provides its own very flexible special purpose XS code
  generator, which is capable of doing things none of the existing
  generators can handle. It's possible that in the future this generator
  will be generalized and used for other projects of a high complexity.
  
  This generator creates the Perl glue code for the public APR and
  Apache API, almost without a need for any extra code, but a few thin
  wrappers to make the API more Perlish.
  
  In particular, since APR can be used outside of Apache, the Perl
  C<APR::> modules can be used outside of Apache as well.
  
  
  
  =head2 Other New Features
  
  In addition to the already mentioned new features, the following are
  of a major importance:
  
  =over
  
  =item *
  
  Apache 2.0 protocol modules are supported. Later we will see an
  example of a protocol module running on top of mod_perl 2.0.
  
  =item *
  
  mod_perl 2.0 provides a very simply to use interface to the Apache
  filtering API. We will present a filter module example later on.
  
  =item *
  
  A feature-full and flexible C<Apache::Test> framework was developed
  especially for mod_perl testing. While used to test the core mod_perl
  features, it is used by third-party module writers to easily test
  their modules. Moreover C<Apache::Test> was adopted by Apache and
  currently used to test both Apache 1.3, 2.0 and other ASF projects.
  Anything that runs top of Apache can be tested with C<Apache::Test>,
  be the target written in Perl, C, PHP, etc.
  
  =item *
  
  The support of the new MPMs model makes mod_perl 2.0 can scale better
  on wider range of platforms. For example if you've happened to try
  mod_perl 1.0 on Win32 you probably know that the requests had to be
  serialized, i.e.  only a single request could be processed at a time,
  rendering the Win32 platform unusable with mod_perl as a heavy
  production service. Thanks to the new Apache MPM design, now mod_perl
  2.0 can be used efficiently on Win32 platforms using its native
  I<win32> MPM.
  
  
  =back
  
  
  =head2 Optimizations
  
  The rewrite of mod_perl gives us the chances to build a smarter,
  stronger and faster implementation based on lessons learned over the
  4.5 years since mod_perl was introduced.  There are optimizations
  which can be made in the mod_perl source code, some which can be made
  in the Perl space by optimizing its syntax tree and some a combination
  of both.  In this section we'll take a brief look at some of the
  optimizations that are being considered.
  
  The details of these optimizations from the most part are hidden from
  mod_perl users, the exception being that some will only be turned on
  with configuration directives.  A few of which include:
  
  =over 4
  
  =item *
  
  "Compiled" C<Perl*Handlers>
  
  =item *
  
  Inlined C<Apache::*.xs> calls
  
  =item *
  
  Use of Apache Pools for memory allocations
  
  =back
  
  
  
  
  
  
  
  
  
  
  
  
  
  =cut


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

cvs commit: modperl-docs/src/docs/2.0/user/intro what_is_new.pod

Reply via email to