stas 2002/09/01 23:34:51 Modified: src/docs/2.0/user config.cfg src/docs/2.0/user/config config.pod src/docs/2.0/user/install install.pod Added: src/docs/2.0/user/handlers filters.pod http.pod intro.pod protocols.pod server.pod Removed: src/docs/2.0/user/handlers handlers.pod Log: split the handlers chapter into several chapters for each topic Revision Changes Path 1.13 +10 -2 modperl-docs/src/docs/2.0/user/config.cfg Index: config.cfg =================================================================== RCS file: /home/cvs/modperl-docs/src/docs/2.0/user/config.cfg,v retrieving revision 1.12 retrieving revision 1.13 diff -u -r1.12 -r1.13 --- config.cfg 13 Aug 2002 11:39:02 -0000 1.12 +++ config.cfg 2 Sep 2002 06:34:51 -0000 1.13 @@ -21,11 +21,19 @@ config/config.pod )], - group => 'Coding Techniques', + group => 'Coding', chapters => [qw( - handlers/handlers.pod compat/compat.pod coding/coding.pod + )], + + group => 'mod_perl Handlers', + chapters => [qw( + handlers/intro.pod + handlers/server.pod + handlers/protocols.pod + handlers/http.pod + handlers/filters.pod )], group => 'Troubleshooting', 1.23 +7 -7 modperl-docs/src/docs/2.0/user/config/config.pod Index: config.pod =================================================================== RCS file: /home/cvs/modperl-docs/src/docs/2.0/user/config/config.pod,v retrieving revision 1.22 retrieving revision 1.23 diff -u -r1.22 -r1.23 --- config.pod 2 Sep 2002 03:38:50 -0000 1.22 +++ config.pod 2 Sep 2002 06:34:51 -0000 1.23 @@ -673,7 +673,7 @@ PerlPreConnectionHandler ITERATE SRV PerlProcessConnectionHandler ITERATE SRV - + PerlPostReadRequestHandler ITERATE SRV PerlTransHandler ITERATE SRV PerlInitHandler ITERATE DIR @@ -686,7 +686,7 @@ PerlResponseHandler ITERATE DIR PerlLogHandler ITERATE DIR PerlCleanupHandler ITERATE DIR - + PerlInputFilterHandler ITERATE DIR PerlOutputFilterHandler ITERATE DIR @@ -756,9 +756,9 @@ =item DIR C<E<lt>DirectoryE<gt>>, C<E<lt>LocationE<gt>>, C<E<lt>FilesE<gt>> and -all their regular expression variants (mnemonic: I<DIRectory>). These directives -can also appear in I<.htaccess> files. These directives are defined -as C<OR_ALL> in the source code. +all their regular expression variants (mnemonic: I<DIRectory>). These +directives can also appear in I<.htaccess> files. These directives +are defined as C<OR_ALL> in the source code. These directives can also appear in the global server configuration and C<E<lt>VirtualHostE<gt>>. @@ -769,8 +769,8 @@ used by the core mod_perl directives and their definition can be found in I<include/httpd_config.h> (hint: search for C<RSRC_CONF>). -Also see L<Perl*Handler -Types|docs::2.0::user::handlers::handlers/Perl_Handler_Types>. +Also see L<Single Phase's Multiple Handlers +Behavior|docs::2.0::user::handlers::intro/Single_Phase_s_Multiple_Handlers_Behavior>. 1.1 modperl-docs/src/docs/2.0/user/handlers/filters.pod Index: filters.pod =================================================================== =head1 NAME Input and Output Filters =head1 Description This chapter discusses mod_perl's input and output filter handlers. =head1 I/O Filtering Apache 2.0 considers all incoming and outgoing data as chunks of information, disregarding their kind and source or storage methods. These data chunks are stored in I<buckets>, which form I<bucket brigades>. Both input and output filters filter the data in bucket brigades. =head2 PerlInputFilterHandler The C<PerlInputFilterHandler> handler registers a filter for input filtering. This handler is of type C<L<VOID|docs::2.0::user::handlers::intro/item_VOID>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. The following sections include several examples that use the C<PerlInputFilterHandler> handler. =head2 PerlOutputFilterHandler The C<PerlOutputFilterHandler> handler registers and configures output filters. This handler is of type C<L<VOID|docs::2.0::user::handlers::intro/item_VOID>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. The following sections include several examples that use the C<PerlOutputFilterHandler> handler. =head2 Connection vs. HTTP Request Filters Currently the mod_perl filters allow connection and request level filtering. Apache supports several other types, which mod_perl 2.0 will probably support in the future. mod_perl filter handlers specify the type of the filter using the method attributes. Request filter handlers are declared using the C<FilterRequestHandler> attribute. Consider the following request input and output filters skeleton: package MyApache::FilterRequestFoo; use base qw(Apache::Filter); sub input : FilterRequestHandler { my($filter, $bb, $mode, $block, $readbytes) = @_; #... } sub output : FilterRequestHandler { my($filter, $bb) = @_; #... } 1; If the attribute is not specified, the default C<FilterRequestHandler> attribute is assumed. Filters specifying subroutine attributes must subclass C<Apache::Filter>, others only need to: use Apache::Filter (); The request filters are usually configured in the C<E<lt>LocationE<gt>> or equivalent sections: PerlModule MyApache::FilterRequestFoo PerlModule MyApache::NiceResponse <Location /filter_foo> SetHandler modperl PerlResponseHandler MyApache::NiceResponse PerlInputFilterHandler MyApache::FilterRequestFoo::input PerlOutputFilterHandler MyApache::FilterRequestFoo::output </Location> Now we have the request input and output filters configured. The connection filter handler uses the C<FilterConnectionHandler> attribute. Here is a similar example for the connection input and output filters. package MyApache::FilterConnectionBar; use base qw(Apache::Filter); sub input : FilterConnectionHandler { my($filter, $bb, $mode, $block, $readbytes) = @_; #... } sub output : FilterConnectionHandler { my($filter, $bb) = @_; #... } 1; This time the configuration must be done outside the C<E<lt>LocationE<gt>> or equivalent sections, usually within the C<E<lt>VirtualHostE<gt>> or the global server configuration: Listen 8005 <VirtualHost _default_:8005> PerlModule MyApache::FilterConnectionBar PerlModule MyApache::NiceResponse PerlInputFilterHandler MyApache::FilterConnectionBar::input PerlOutputFilterHandler MyApache::FilterConnectionBar::output <Location /> SetHandler modperl PerlResponseHandler MyApache::NiceResponse </Location> </VirtualHost> This accomplishes the configuration of the connection input and output filters. Notice that for HTTP requests the only difference between connection filters and request filters is that the former see everything: the headers and the body, whereas the latter see only the body. [META: This belongs to the Apache::Filter manpage and should be moved there when this page is created. Inside a connection filter the current connection object can be retrieved with: my $c = $filter->c; Inside a request filter the current request object can be retrieved with: my $r = $filter->r; ] mod_perl provides two interfaces to filtering: a direct bucket brigades manipulation interface and a simpler, stream-oriented interface (XXX: as of this writing the latter is available only for the output filtering). The examples in the following sections will help you to understand the difference between the two interfaces. =head1 All-in-One Filter Before we delve into the details of how to write filters that do something with the data, lets first write a simple filter that does nothing but snooping on the data that goes through it. We are going to develop the C<MyApache::FilterSnoop> handler which can snoop on request and connection filters, in input and output modes. But first let's develop a simple response handler that simply dumps the request's I<args> and I<content> as strings: file:MyApache/Dump.pm --------------------- package MyApache::Dump; use strict; use warnings; use Apache::RequestRec (); use Apache::RequestIO (); use Apache::Const -compile => qw(OK M_POST); sub handler { my $r = shift; $r->content_type('text/plain'); $r->print("args:\n", $r->args, "\n"); if ($r->method_number == Apache::M_POST) { my $data = content($r); $r->print("content:\n$data\n"); } return Apache::OK; } sub content { my $r = shift; $r->setup_client_block; return '' unless $r->should_client_block; my $len = $r->headers_in->get('content-length'); my $buf; $r->get_client_block($buf, $len); return $buf; } 1; which is configured as: PerlModule MyApache::Dump <Location /dump> SetHandler modperl PerlResponseHandler MyApache::Dump </Location> If we issue the following request: % echo "mod_perl rules" | POST 'http://localhost:8002/dump?foo=1&bar=2' the response will be: args: foo=1&bar=2 content: mod_perl rules As you can see it simply dumped the query string and the posted data. Now let's write the snooping filter: file:MyApache/FilterSnoop.pm ---------------------------- package MyApache::FilterSnoop; use strict; use warnings; use base qw(Apache::Filter); use Apache::FilterRec (); use APR::Brigade (); use Apache::Const -compile => qw(OK DECLINED); use APR::Const -compile => ':common'; sub connection : FilterConnectionHandler { snoop("connection", @_) } sub request : FilterRequestHandler { snoop("request", @_) } sub snoop { my $type = shift; my($filter, $bb, $mode, $block, $readbytes) = @_; # filter args # $mode, $block, $readbytes are passed only for input filters my $stream = defined $mode ? "input" : "output"; # read the data and pass-through the bucket brigades unchanged my $ra_data = ''; if (defined $mode) { # input filter my $rv = $filter->next->get_brigade($bb, $mode, $block, $readbytes); return $rv unless $rv == APR::SUCCESS; $ra_data = bb_sniff($bb); } else { # output filter $ra_data = bb_sniff($bb); my $rv = $filter->next->pass_brigade($bb); return $rv unless $rv == APR::SUCCESS; } # send the sniffed info to STDERR so not to interfere with normal # output my $direction = $stream eq 'output' ? ">>>" : "<<<"; print STDERR "\n$direction $type $stream filter\n"; my $c = 1; while (my($btype, $data) = splice @$ra_data, 0, 2) { print STDERR " o bucket $c: $btype\n"; print STDERR "[$data]\n"; $c++; } return Apache::OK; } sub bb_sniff { my $bb = shift; my @data; for (my $b = $bb->first; $b; $b = $bb->next($b)) { $b->read(my $bdata); $bdata = '' unless defined $bdata; push @data, $b->type->name, $bdata; } return [EMAIL PROTECTED]; } 1; This package provides two filter handlers, one for connection and another for request filtering: sub connection : FilterConnectionHandler { snoop("connection", @_) } sub request : FilterRequestHandler { snoop("request", @_) } Both handlers forward their arguments to the C<snoop()> function that does the real job. We needed to add these two subroutines in order to assign the two different attributes. Plus the functions pass the filter type to C<snoop()> as the first argument, which gets shifted off C<@_> and the rest of the C<@_> are the arguments that were originally passed to the filter handler. It's easy to know whether a filter handler is running in the input or the output mode. The arguments C<$filter> and C<$bb> are always passed, whereas the arguments C<$mode>, C<$block>, and C<$readbytes> are passed only to input filter handlers. If we are in the input mode, we retrieve the bucket brigade and immediately link it to C<$bb> which makes the brigade available to the next filter. When this filter handler returns, the next filter on the stack will get the brigade. If we forget to perform this linking our filter will become a black hole in which data simply disappears. Next we call C<bb_sniff()> which returns the type and the content of the buckets in the brigade. If we are in the output mode, C<$bb> already points to the current bucket brigade. Therefore we can read the contents of the brigade right away. After that we pass the brigade to the next filter. Finally we dump to STDERR the information about the type of the current mode, and the content of the bucket bridge. Let's snoop on connection and request filter levels in both directions by applying the following configuration: Listen 8008 <VirtualHost _default_:8008> PerlModule MyApache::FilterSnoop PerlModule MyApache::Dump # Connection filters PerlInputFilterHandler MyApache::FilterSnoop::connection PerlOutputFilterHandler MyApache::FilterSnoop::connection <Location /dump> SetHandler modperl PerlResponseHandler MyApache::Dump # Request filters PerlInputFilterHandler MyApache::FilterSnoop::request PerlOutputFilterHandler MyApache::FilterSnoop::request </Location> </VirtualHost> Notice that we use a virtual host because we want to install connection filters. If we issue the following request: % echo "mod_perl rules" | POST 'http://localhost:8008/dump?foo=1&bar=2' We get the same response, because our snooping filter didn't change anything. Though there was a lot of output printed to I<error_log>. We present it all here, since it helps a lot to understand how filters work. First we can see the connection input filter at work, as it processes the HTTP headers. We can see that for this request each header is put into a separate brigade with a single bucket. The data is conveniently enclosed by C<[]> so you can see the new line characters as well. <<< connection input filter o bucket 1: HEAP [POST /dump?foo=1&bar=2 HTTP/1.1 ] <<< connection input filter o bucket 1: HEAP [TE: deflate,gzip;q=0.3 ] <<< connection input filter o bucket 1: HEAP [Connection: TE, close ] <<< connection input filter o bucket 1: HEAP [Host: localhost:8008 ] <<< connection input filter o bucket 1: HEAP [User-Agent: lwp-request/2.01 ] <<< connection input filter o bucket 1: HEAP [Content-Length: 14 ] <<< connection input filter o bucket 1: HEAP [Content-Type: application/x-www-form-urlencoded ] <<< connection input filter o bucket 1: HEAP [ ] Here the HTTP header has been terminated by a double new line. So far all the buckets were of the I<HEAP> type, meaning that they were allocated from the heap memory. Notice that the request input filters will never see the bucket brigade with HTTP header, it has been consumed by the last connection Apache core handler. The following two entries are generated when C<MyApache::Dump::handler> reads the POSTed content: <<< connection input filter o bucket 1: HEAP [mod_perl rules] <<< request input filter o bucket 1: HEAP [mod_perl rules] o bucket 2: EOS [] as we saw earlier on the diagram, the connection input filter is run before the request input filter. Since our connection input filter was passing the data through unmodified and no other connection input filter was configured, the request input filter sees the same data. The last bucket in the brigade received by the request input filter is of type I<EOS>, meaning that all the input data from the current request has been received. Next we can see that C<MyApache::Dump::handler> has generated its response. However only the request output filter is filtering it at this point: >>> request output filter o bucket 1: TRANSIENT [args: foo=1&bar=2 content: mod_perl rules ] This happens because Apache hasn't sent yet the response HTTP headers to the client. Apache postpones the header sending so it can calculate and set the C<Content-Length> header. This time the brigade consists of a single bucket of type I<TRANSIENT> which is allocated from the stack memory, which will eventually be converted to the I<HEAP> type, before the body of the response is sent to the client. When the content handler returns Apache sends the HTTP headers through connection output filters (notice that the request output filters don't see it): >>> connection output filter o bucket 1: HEAP [HTTP/1.1 200 OK Date: Wed, 14 Aug 2002 07:31:53 GMT Server: Apache/2.0.41-dev (Unix) mod_perl/1.99_05-dev Perl/v5.8.0 mod_ssl/2.0.41-dev OpenSSL/0.9.6d DAV/2 Content-Length: 42 Connection: close Content-Type: text/plain; charset=ISO-8859-1 ] Now the response body in the bucket of type I<HEAP> is passed through the connection output filter, followed by the I<EOS> bucket to mark the end of the request: >>> connection output filter o bucket 1: HEAP [args: foo=1&bar=2 content: mod_perl rules ] o bucket 2: EOS [] Finally the output is flushed, to make sure that any buffered output is sent to the client: >>> connection output filter o bucket 1: FLUSH [] This module helps to understand that each filter handler can be called many time during each request and connection. It's called for each bucket brigade. Also it's important to notice that the request input filter is called only if there is some POSTed data to read, if you run the same request without POSTing any data or simply running a GET request, the request input filter won't be called. =head1 Input Filters mod_perl supports L<Connection|/Connection_Input_Filters> and L<HTTP Request|/HTTP_Request_Input_Filters> input filters: =head2 Connection Input Filters Let's say that we want to test how our handlers behave when they are requested as C<HEAD> requests, rather than C<GET>. We can alter the request headers at the incoming connection level transparently to all handlers. So here is the input filter handler that does that by directly manipulating the bucket brigades: file:MyApache/InputFilterGET2HEAD.pm ----------------------------------- package MyApache::InputFilterGET2HEAD; use strict; use warnings; use base qw(Apache::Filter); use Apache::RequestRec (); use Apache::RequestIO (); use APR::Brigade (); use APR::Bucket (); use Apache::Const -compile => 'OK'; use APR::Const -compile => ':common'; sub handler : FilterConnectionHandler { my($filter, $bb, $mode, $block, $readbytes) = @_; my $c = $filter->c; my $ctx_bb = APR::Brigade->new($c->pool, $c->bucket_alloc); my $rv = $filter->next->get_brigade($ctx_bb, $mode, $block, $readbytes); return $rv unless $rv == APR::SUCCESS; while (!$ctx_bb->empty) { my $bucket = $ctx_bb->first; $bucket->remove; if ($bucket->is_eos) { $bb->insert_tail($bucket); last; } my $data; my $status = $bucket->read($data); return $status unless $status == APR::SUCCESS; if ($data and $data =~ s|^GET|HEAD|) { $bucket = APR::Bucket->new($data); } $bb->insert_tail($bucket); } Apache::OK; } 1; The filter handler is called for each bucket brigade, which in turn includes buckets with data. The gist of any filter handler is to retrieve the bucket brigade sent from the previous filter, prepare a new empty brigade, and move buckets from the former brigade to the latter optionally modifying the buckets on the way, which may include removing or adding new buckets. Of course if the filter doesn't want to modify any of the buckets it may decide to pass through the original brigade without doing any work. In our example the handler first removes the bucket at the top of the brigade and looks at its type. If it sees an end of stream, that removed bucket is linked to the tail of the bucket brigade that will go to the next filter and it doesn't attempt to read any more buckets. If this event doesn't happen the handler reads the data from that bucket and if it finds that the data is of interest to us, it modifies the data, creates a new bucket using the modified data and links it to the tail of the outgoing brigade, while discarding the original bucket. In our case the interesting data is a such that matches the regular expression C</^GET/>. If the data is not interesting to the handler, it simply links the unmodified bucket to the outgoing brigade. The handler looks for data like: GET /perl/test.pl HTTP/1.1 and turns it into: HEAD /perl/test.pl HTTP/1.1 For example, consider the following response handler: file:MyApache/RequestType.pm --------------------------- package MyApache::RequestType; use strict; use warnings; use Apache::Const -compile => 'OK'; sub handler { my $r = shift; $r->content_type('text/plain'); $r->print("the request type was " . $r->method); Apache::OK; } 1; which returns to the client the request type it has issued. In the case of the C<HEAD> request Apache will discard the response body, but it'll will still set the correct C<Content-Length> header, which will be 24 in case of the C<GET> request and 25 for C<HEAD>. Therefore if this response handler is configured as: Listen 8005 <VirtualHost _default_:8005> <Location /> SetHandler modperl PerlResponseHandler +MyApache::RequestType </Location> </VirtualHost> and a C<GET> request is issued to I</>: panic% perl -MLWP::UserAgent -le \ '$r = LWP::UserAgent->new()->get("http://localhost:8005/"); \ print $r->headers->content_length . ": ". $r->content' 24: the request type was GET where the response's body is: the request type was GET And the C<Content-Length> header is set to 24. However if we enable the C<MyApache::InputFilterGET2HEAD> input connection filter: Listen 8005 <VirtualHost _default_:8005> PerlInputFilterHandler +MyApache::InputFilterGET2HEAD <Location /> SetHandler modperl PerlResponseHandler +MyApache::RequestType </Location> </VirtualHost> And issue the same C<GET> request, we get only: 25: which means that the body was discarded by Apache, because our filter turned the C<GET> request into a C<HEAD> request and if Apache wasn't discarding the body on C<HEAD>, the response would be: the request type was HEAD that's why the content length is reported as 25 and not 24 as in the real GET request. =head2 HTTP Request Input Filters Request filters are really non-different from connection filters, other than that they are working on request and response bodies and have an access to a request object. The filter implementation is pretty much identical. Let's look at the request input filter that lowercases the request's body C<MyApache::InputRequestFilterLC>: file:MyApache/InputRequestFilterLC.pm ------------------------------------- package MyApache::InputRequestFilterLC; use strict; use warnings; use base qw(Apache::Filter); use APR::Brigade (); use APR::Bucket (); use Apache::Const -compile => 'OK'; use APR::Const -compile => ':common'; sub handler : FilterRequestHandler { my($filter, $bb, $mode, $block, $readbytes) = @_; my $c = $filter->c; my $bb_ctx = APR::Brigade->new($c->pool, $c->bucket_alloc); my $rv = $filter->next->get_brigade($bb_ctx, $mode, $block, $readbytes); return $rv unless $rv == APR::SUCCESS; while (!$bb_ctx->empty) { my $b = $bb_ctx->first; $b->remove; if ($b->is_eos) { $bb->insert_tail($b); last; } my $data; my $status = $b->read($data); return $status unless $status == APR::SUCCESS; $b = APR::Bucket->new(lc $data) if $data; $bb->insert_tail($b); } Apache::OK; } 1; Now if we use the C<MyApache::Dump> response handler, we have developed before in this chapter, which dumps the query string and the content body as a response, and configure the server as follows: <Location /lc_input> SetHandler modperl PerlResponseHandler +MyApache::Dump PerlInputFilterHandler +MyApache::InputRequestFilterLC </Location> When issuing a POST request: % echo "mOd_pErl RuLeS" | POST 'http://localhost:8002/dump_input?FoO=1&BAR=2' we get a response: args: FoO=1&BAR=2 content: mod_perl rules indeed we can see that our filter has lowercased the POSTed body, before the content handler received it. You can see that the query string wasn't changed. =head1 Output Filters mod_perl supports L<Connection|/Connection_Output_Filters> and L<HTTP Request|/HTTP_Request_Output_Filters> output filters: =head2 Connection Output Filters Connection filters filter B<all> the data that is going through the server. Therefore if the connection is of HTTP request type, connection output filters see the headers and the body of the response, whereas request output filters see only the response body. META: for now see the request output filter explanations and examples, connection output filter examples will be added soon. Interesting ideas for such filters are welcome (mainly for mungling output headers I suppose). =head2 HTTP Request Output Filters As mentioned earlier output filters can be written using the bucket brigades manipulation or the simplified stream-oriented interface. First let's develop a response handler that send two lines of output: numerals 0-9 and the English alphabet in a single string: file:MyApache/SendAlphaNum.pm ------------------------------- package MyApache::SendAlphaNum; use strict; use warnings; use Apache::RequestRec (); use Apache::RequestIO (); use Apache::Const -compile => qw(OK); sub handler { my $r = shift; $r->content_type('text/plain'); $r->print(0..9, "0\n"); $r->print('a'..'z', "\n"); Apache::OK; } 1; The purpose of our request output filter is to reverse every line of the response, preserving the new line characters in their places. Since we want to reverse characters only in the response body we will use the request output filters. =head3 Stream-oriented Output Filter The first filter implementation is using the stream-oriented filtering API: file:MyApache/FilterReverse1.pm ---------------------------- package MyApache::FilterReverse1; use strict; use warnings; use Apache::Filter (); use Apache::Const -compile => qw(OK); use constant BUFF_LEN => 1024; sub handler : FilterRequestHandler { my $filter = shift; while ($filter->read(my $buffer, BUFF_LEN)) { for (split "\n", $buffer) { $filter->print(scalar reverse $_); $filter->print("\n"); } } Apache::OK; } 1; Next, we add the following configuration to I<httpd.conf>: PerlModule MyApache::FilterReverse1 PerlModule MyApache::SendAlphaNum <Location /reverse1> SetHandler modperl PerlResponseHandler MyApache::SendAlphaNum PerlOutputFilterHandler MyApache::FilterReverse1 </Location> Now when a request to I</reverse1> is made, the response handler C<MyApache::SendAlphaNum::handler()> sends: 1234567890 abcdefghijklmnopqrstuvwxyz as a response and the output filter handler C<MyApache::FilterReverse1::handler> reverses the lines, so the client gets: 0987654321 zyxwvutsrqponmlkjihgfedcba The C<Apache::Filter> module loads the C<read()> and C<print()> methods which encapsulate the stream-oriented filtering interface. The reversing filter is quite simple: in the loop it reads the data in the I<readline()> mode in chunks up to the buffer length (1024 in our example), and then prints each line reversed while preserving the new line control characters at the end of each line. Behind the scenes C<$filter-E<gt>read()> retrieves the incoming brigade and gets the data from it, whereas C<$filter-E<gt>print()> appends to the new brigade which is then sent to the next filter in the stack. C<read()> breaks the while loop, when the brigade is emptied or the end of stream is received. In order not to distract the reader from the purpose of the example the used code is oversimplified and won't handle correctly input lines which are longer than 1024 characters and possibly using a different line termination pattern. So here is an example of a more complete handler, which does takes care of these issues: sub handler { my $filter = shift; my $left_over = ''; while ($filter->read(my $buffer, BUFF_LEN)) { $buffer = $left_over . $buffer; $left_over = ''; while ($buffer =~ /([^\r\n]*)([\r\n]*)/g) { $left_over = $1, last unless $2; $filter->print(scalar(reverse $1), $2); } } $filter->print(scalar reverse $left_over) if length $left_over; Apache::OK; } In this handler the lines longer than the buffer's length are buffered up in C<$left_over> and processed only when the whole line is read in, or if there is no more input the buffered up text is flushed before the end of the handler. =head3 Bucket Brigade-based Output Filters The second filter implementation is using the bucket brigades API to accomplish exactly the same task as the first filter. package MyApache::FilterReverse2; use strict; use warnings; use Apache::Filter; use APR::Brigade (); use APR::Bucket (); use Apache::Const -compile => 'OK'; use APR::Const -compile => ':common'; sub handler : FilterRequestHandler { my($filter, $bb) = @_; my $c = $filter->c; my $bb_ctx = APR::Brigade->new($c->pool, $c->bucket_alloc); while (!$bb->empty) { my $bucket = $bb->first; $bucket->remove; if ($bucket->is_eos) { $bb_ctx->insert_tail($bucket); last; } my $data; my $status = $bucket->read($data); return $status unless $status == APR::SUCCESS; if ($data) { $data = join "", map {scalar(reverse $_), "\n"} split "\n", $data; $bucket = APR::Bucket->new($data); } $bb_ctx->insert_tail($bucket); } my $rv = $filter->next->pass_brigade($bb_ctx); return $rv unless $rv == APR::SUCCESS; Apache::OK; } 1; and the corresponding configuration: PerlModule MyApache::FilterReverse2 PerlModule MyApache::SendAlphaNum <Location /reverse2> SetHandler modperl PerlResponseHandler MyApache::SendAlphaNum PerlOutputFilterHandler MyApache::FilterReverse2 </Location> Now when a request to I</reverse2> is made, the client gets: 0987654321 zyxwvutsrqponmlkjihgfedcba as expected. The bucket brigades output filter version is just a bit more complicated than the stream-oriented one. The handler receives the incoming bucket brigade C<$bb> as its second argument. Since when the handler is completed it must pass a brigade to the next filter in the stack, we create a new bucket brigade into which we are going to put the modified buckets and which eventually we pass to the next filter. The core of the handler is in removing buckets from the head of the bucket brigade C<$bb> while there are some, reading the data from the buckets, reversing and putting it into a newly created bucket which is inserted to the end of the new bucket brigade. If we see a bucket which designates the end of stream, we insert that bucket to the tail of the new bucket brigade and break the loop. Finally we pass the created brigade with modified data to the next filter and return. =head1 Filter Tips and Tricks Various tips to use in filters. =head2 Altering the Content-Type Response Header Let's say that you want to modify the C<Content-Type> header in the request output filter: sub handler : FilterRequestHandler { my $filter = shift; ... $filter->r->content_type("text/html; charset=$charset"); ... Request filters have an access to the request object, so we simply modify it. =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.1 modperl-docs/src/docs/2.0/user/handlers/http.pod Index: http.pod =================================================================== =head1 NAME HTTP Handlers =head1 Description This chapter explains how to implement the HTTP protocol handlers in mod_perl. =head1 HTTP Request Cycle Phases Those familiar with mod_perl 1.0 will find the HTTP request cycle in mod_perl 2.0 to be almost identical to the mod_perl 1.0's model. The only difference is in the I<response> phase which now includes filtering. Also the C<PerlHandler> directive has been renamed to C<PerlResponseHandler> to better match the corresponding Apache phase name (I<response>). The following diagram depicts the HTTP request life cycle and highlights which handlers are available to mod_perl 2.0: =for html <img src="http_cycle.gif" width="600" height="560" align="center" valign="middle" alt="HTTP cycle"><br><br> From the diagram it can be seen that an HTTP request is processes by 11 phases, executed in the following order: =over =item 1 PerlPostReadRequestHandler (PerlInitHandler) =item 2 PerlTransHandler =item 3 PerlHeaderParserHandler (PerlInitHandler) =item 4 PerlAccessHandler =item 5 PerlAuthenHandler =item 6 PerlAuthzHandler =item 7 PerlTypeHandler =item 8 PerlFixupHandler =item 9 PerlResponseHandler =item 10 PerlLogHandler =item 11 PerlCleanupHandler =back It's possible that the cycle will not be completed if any of the phases terminates it, usually when an error happens. Notice that when the response handler is reading the input data it can be filtered through request input filters, which are preceded by connection input filters if any. Similarly the generated response is first run through request output filters and eventually through connection output filters before it's sent to the client. We will talk about filters in detail later in this chapter. Now let's discuss each of the mentioned handlers in detail. =head2 PerlPostReadRequestHandler The I<post_read_request> phase is the first request phase and happens immediately after the request has been read and HTTP headers were parsed. This phase is usually used to do processing that must happen once per request. For example C<Apache::Reload> is usually invoked at this phase to reload modified Perl modules. This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<SRV|docs::2.0::user::config::config/item_SRV>>, because at this phase the request has not yet been associated with a particular filename or directory. Now, let's look at an example. Consider the following registry script: touch.pl -------- use strict; use warnings; use Apache::ServerUtil (); use File::Spec::Functions qw(catfile); my $r = shift; $r->content_type('text/plain'); my $conf_file = catfile Apache::server_root_relative($r->pool, 'conf'), "httpd.conf"; printf "$conf_file is %0.2f minutes old", 60*24*(-M $conf_file); This registry script is supposed to print when the last time I<httpd.conf> has been modified, compared to the start of the request process time. If you run this script several times you might be surprised that it reports the same value all the time. Unless the request happens to be served by a recently started child process which will then report a different value. But most of the time the value won't be reported correctly. This happens because the C<-M> operator reports the difference between file's modification time and the value of a special Perl variable C<$^T>. When we run scripts from the command line, this variable is always set to the time when the script gets invoked. Under mod_perl this variable is getting preset once when the child process starts and doesn't change since then, so all requests see the same time, when operators like C<-M>, C<-C> and C<-A> are used. Armed with this knowledge, in order to make our code behave similarly to the command line programs we need to reset C<$^T> to the request's start time, before C<-M> is used. We can change the script itself, but what if we need to do the same change for several other scripts and handlers? A simple C<PerlPostReadRequestHandler> handler, which will be executed as the very first thing of each requests, comes handy here: file:MyApache/TimeReset.pm -------------------------- package MyApache::TimeReset; use strict; use warnings; use Apache::RequestRec (); use Apache::Const -compile => 'OK'; sub handler { my $r = shift; $^T = $r->request_time; return Apache::OK; } 1; We could do: $^T = time(); But to make things more efficient we use C<$r-E<gt>request_time> since the request object C<$r> already stores the request's start time, so we get it without performing an additional system call. To enable it just add to I<httpd.conf>: PerlPostReadRequestHandler MyApache::TimeReset either to the global section, or to the C<E<lt>VirtualHostE<gt>> section if you want this handler to be run only for a specific virtual host. =head2 PerlTransHandler The I<translate> phase is used to perform the translation of a request's URI into an corresponding filename. If no custom handler is provided, the server's standard translation rules (e.g., C<Alias> directives, mod_rewrite, etc.) will continue to be used. A C<PerlTransHandler> handler can alter the default translation mechanism or completely override it. In addition to doing the translation, this stage can be used to modify the URI itself and the request method. This is also a good place to register new handlers for the following phases based on the URI. This phase is of type C<L<RUN_FIRST|docs::2.0::user::handlers::intro/item_RUN_FIRST>>. The handler's configuration scope is C<L<SRV|docs::2.0::user::config::config/item_SRV>>, because at this phase the request has not yet been associated with a particular filename or directory. There are many useful things that can be performed at this stage. Let's look at the example handler that rewrites request URIs, similar to what mod_rewrite does. For example, if your web-site was originally made of static pages, and now you have moved to a dynamic page generation chances are that you don't want to change the old URIs, because you don't want to break links for those who link to your site. If the URI: http://example.com/news/20021031/09/index.html is now handled by: http://example.com/perl/news.pl?date=20021031&id=09&page=index.html the following handler can do the rewriting work transparent to I<news.pl>, so you can still use the former URI mapping: file:MyApache/RewriteURI.pm --------------------------- package MyApache::RewriteURI; use strict; use warnings; use Apache::RequestRec (); use Apache::Const -compile => qw(DECLINED); sub handler { my $r = shift; my ($date, $id, $page) = $r->uri =~ m|^/news/(\d+)/(\d+)/(.*)|; $r->uri("/perl/news.pl"); $r->args("date=$date&id=$id&page=$page"); return Apache::DECLINED; } 1; The handler matches the URI and assigns a new URI via C<$r-E<gt>uri()> and the query string via C<$r-E<gt>args()>. It then returns C<Apache::DECLINED>, so the next translation handler will get invoked, if more rewrites and translations are needed. Of course if you need to do a more complicated rewriting, this handler can be easily adjusted to do so. To configure this module simply add to I<httpd.conf>: PerlTransHandler +MyApache::RewriteURI =head2 PerlHeaderParserHandler The I<header_parser> phase is the first phase to happen after the request has been mapped to its C<E<lt>LocationE<gt>> (or an equivalent container). At this phase the handler can examine the request headers and to take a special action based on these. For example this phase can be used to block evil clients targeting certain resources, while little resources were wasted so far. This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. This phase is very similar to C<L<PerlPostReadRequestHandler|/PerlPostReadRequestHandler>>, with the only difference that it's run after the request has been mapped to the resource. Both phases are useful for doing something once per request, as early as possible. And usually you can take any C<L<PerlPostReadRequestHandler|/PerlPostReadRequestHandler>> and turn it into C<L<PerlHeaderParserHandler|/PerlHeaderParserHandler>> by simply changing the directive name in I<httpd.conf> and moving it inside the container where it should be executed. Moreover, because of this similarity mod_perl provides a special directive C<L<PerlInitHandler|/PerlInitHandler>> which if found outside resource containers behaves as C<L<PerlPostReadRequestHandler|/PerlPostReadRequestHandler>>, otherwise as C<L<PerlHeaderParserHandler|/PerlHeaderParserHandler>>. You already know that Apache handles the C<HEAD>, C<GET>, C<POST> and several other HTTP methods. But did you know that you can invent your own HTTP method as long as there is a client that supports it. If you think of emails, they are very similar to HTTP messages: they have a set of headers and a body, sometimes a multi-part body. Therefore we can develop a handler that extends HTTP by adding a support for the C<EMAIL> method. We can enable this protocol extension during and push the real content handler during the C<L<PerlHeaderParserHandler|/PerlHeaderParserHandler>> phase: <Location /email> PerlHeaderParserHandler MyApache::SendEmail </Location> and here is the C<MyApache::SendEmail> handler: file:MyApache/SendEmail.pm -------------------------- package MyApache::SendEmail; use strict; use warnings; use Apache::RequestRec (); use Apache::RequestIO (); use Apache::RequestUtil (); use Apache::Const -compile => qw(DECLINED OK); use constant METHOD => 'EMAIL'; use constant SMTP_HOSTNAME => "localhost"; sub handler { my $r = shift; return Apache::DECLINED unless $r->method eq METHOD; Apache::method_register($r->pool, METHOD); $r->handler("perl-script"); $r->push_handlers(PerlHandler => \&send_email_handler); return Apache::OK; } sub send_email_handler { my $r = shift; my %headers = map {$_ => $r->headers_in->get($_)} qw(To From Subject); my $content = $r->content; my $status = send_email(\%headers, \$content); $r->content_type('text/plain'); $r->print($status ? "ACK" : "NACK"); return Apache::OK; } sub content { my $r = shift; $r->setup_client_block; return '' unless $r->should_client_block; my $len = $r->headers_in->get('content-length'); my $buf; $r->get_client_block($buf, $len); return $buf; } sub send_email { my($rh_headers, $r_body) = @_; require MIME::Lite; MIME::Lite->send("smtp", SMTP_HOSTNAME, Timeout => 60); my $msg = MIME::Lite->new(%$rh_headers, Data => $$r_body); #warn $msg->as_string; $msg->send; } 1; Let's get the less interesting code out of the way. The function content() grabs the request body. The function send_email() sends the email over SMTP. You should adjust the constant C<SMTP_HOSTNAME> to point to your outgoing SMTP server. You can replace this function with your own if you prefer to use a different method to send email. Now to the more interesting functions. The function C<handler()> returns immediately and passes the control to the next handler if the request method is not equal to C<EMAIL> (set in the C<METHOD> constant): return Apache::DECLINED unless $r->method eq METHOD; Next it tells Apache that this new method is a valid one and that the C<perl-script> handler will do the processing. Finally it pushes the function C<send_email_handler()> to the C<PerlResponseHandler> list of handlers: Apache::method_register($r->pool, METHOD); $r->handler("perl-script"); $r->push_handlers(PerlResponseHandler => \&send_email_handler); The function terminates the header_parser phase by: return Apache::OK; All other phases run as usual, so you can reuse any HTTP protocol hooks, such as authentication and fixup phases. When the response phase starts C<send_email_handler()> is invoked, assuming that no other response handlers were inserted before it. The response handler consists of three parts. Retrieve the email headers C<To>, C<From> and C<Subject>, and the body of the message: my %headers = map {$_ => $r->headers_in->get($_)} qw(To From Subject); my $content = $r->content; Then send the email: my $status = send_email(\%headers, \$content); Finally return to the client a simple response acknowledging that email has been sent and finish the response phase by returning C<Apache::OK>: $r->content_type('text/plain'); $r->print($status ? "ACK" : "NACK"); return Apache::OK; Of course you will want to add extra validations if you want to use this code in production. This is just a proof of concept implementation. As already mentioned when you extend an HTTP protocol you need to have a client that knows how to use the extension. So here is a simple client that uses C<LWP::UserAgent> to issue an C<EMAIL> method request over HTTP protocol: file:send_http_email.pl ----------------------- #!/usr/bin/perl use strict; use warnings; require LWP::UserAgent; my $url = "http://localhost:8000/email/"; my %headers = ( From => '[EMAIL PROTECTED]', To => '[EMAIL PROTECTED]', Subject => '3 weeks in Tibet', ); my $content = <<EOI; I didn't have an email software, but could use HTTP so I'm sending it over HTTP EOI my $headers = HTTP::Headers->new(%headers); my $req = HTTP::Request->new("EMAIL", $url, $headers, $content); my $res = LWP::UserAgent->new->request($req); print $res->is_success ? $res->content : "failed"; most of the code is just a custom data. The code that does something consists of four lines at the very end. Create C<HTTP::Headers> and C<HTTP::Request> object. Issue the request and get the response. Finally print the response's content if it was successful or just I<"failed"> if not. Now save the client code in the file I<send_http_email.pl>, adjust the I<To> field, make the file executable and execute it, after you have restarted the server. You should receive an email shortly to the address set in the I<To> field. =head2 PerlInitHandler When configured inside any container directive, except C<E<lt>VirtualHostE<gt>>, this handler is an alias for C<L<PerlHeaderParserHandler|/PerlHeaderParserHandler>> described later. Otherwise it acts as an alias for C<L<PerlPostReadRequestHandler|/PerlPostReadRequestHandler>> described earlier. It is the first handler to be invoked when serving a request. This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The best example here would be to use C<L<Apache::Reload|docs::2.0::api::mod_perl-2.0::Apache::Reload>> which takes the benefit of this directive. Usually C<L<Apache::Reload|docs::2.0::api::mod_perl-2.0::Apache::Reload>> is configured as: PerlInitHandler Apache::Reload PerlSetVar ReloadAll Off PerlSetVar ReloadModules "MyApache::*" which will monitor and reload all C<MyApache::*> modules that have been modified since the last request. However if we move the global configuration into a C<E<lt>LocationE<gt>> container: <Location /devel> PerlInitHandler Apache::Reload PerlSetVar ReloadAll Off PerlSetVar ReloadModules "MyApache::*" SetHandler perl-script PerlHandler ModPerl::Registry Options +ExecCGI </Location> C<L<Apache::Reload|docs::2.0::api::mod_perl-2.0::Apache::Reload>> will reload the modified modules, only when a request to the I</devel> namespace is issued, because C<L<PerlInitHandler|/PerlInitHandler>> plays the role of C<L<PerlHeaderParserHandler|/PerlHeaderParserHandler>> here. =head2 PerlAccessHandler The I<access_checker> phase is the first of three handlers that are involved in what's known as AAA: Authentication and Authorization, and Access control. This phase can be used to restrict access from a certain IP address, time of the day or any other rule not connected to the user's identity. This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. The concept behind access checker handler is very simple, return C<Apache::FORBIDDEN> if the access is not allowed, otherwise return C<Apache::OK>. The following example handler blocks requests made from IPs on the blacklist. file:MyApache/BlockByIP.pm -------------------------- package MyApache::BlockByIP; use strict; use warnings; use Apache::RequestRec (); use Apache::Connection (); use Apache::Const -compile => qw(FORBIDDEN OK); my %bad_ips = map {$_ => 1} qw(127.0.0.1 10.0.0.4); sub handler { my $r = shift; return exists $bad_ips{$r->connection->remote_ip} ? Apache::FORBIDDEN : Apache::OK; } 1; The handler retrieves the connection's IP address, looks it up in the hash of blacklisted IPs and forbids the access if found. If the IP is not blacklisted, the handler returns control to the next access checker handler, which may still block the access based on a different rule. To enable the handler simply add it to the container that needs to be protected. For example to protect an access to the registry scripts executed from the base location I</perl> add: <Location /perl/> SetHandler perl-script PerlResponseHandler ModPerl::Registry PerlAccessHandler MyApache::BlockByIP Options +ExecCGI </Location> =head2 PerlAuthenHandler The I<check_user_id> (I<authen>) phase is called whenever the requested file or directory is password protected. This, in turn, requires that the directory be associated with C<AuthName>, C<AuthType> and at least one C<require> directive. This phase is usually used to verify a user's identification credentials. If the credentials are verified to be correct, the handler should return C<OK>. Otherwise the handler returns C<AUTH_REQUIRED> to indicate that the user has not authenticated successfully. When Apache sends the HTTP header with this code, the browser will normally pop up a dialog box that prompts the user for login information. This phase is of type C<L<RUN_FIRST|docs::2.0::user::handlers::intro/item_RUN_FIRST>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. The following handler authenticates users by asking for a username and a password and lets them in only if the length of a string made from the supplied username and password and a single space equals to the secret length, specified by the constant C<SECRET_LENGTH>. file:MyApache/SecretLengthAuth.pm --------------------------------- package MyApache::SecretLengthAuth; use strict; use warnings; use Apache::Const -compile => qw(OK DECLINED AUTH_REQUIRED); use constant SECRET_LENGTH => 14; sub handler { my $r = shift; my ($status, $password) = $r->get_basic_auth_pw; return $status unless $status == Apache::OK; return Apache::OK if SECRET_LENGTH == length join " ", $r->user, $password; $r->note_basic_auth_failure; return Apache::AUTH_REQUIRED; } 1; First the handler retrieves the status of the authentication and the password in plain text. The status will be set to C<Apache::OK> only when the user has supplied the username and the password credentials. If the status is different, we just let Apache handle this situation for us, which will usually challenge the client so it'll supply the credentials. Once we know that we have the username and the password supplied by the client, we can proceed with the authentication. Our authentication algorithm is unusual. Instead of validating the username/password pair against a password file, we simply check that the string built from these two items plus a single space is C<SECRET_LENGTH> long (14 in our example). So for example the pair I<mod_perl/rules> authenticates correctly, whereas I<secret/password> does not, because the latter pair will make a string of 15 characters. Of course this is not a strong authentication scheme and you shouldn't use it for serious things, but it's fun to play with. Most authentication validations simply verify the username/password against a database of valid pairs, usually this requires the password to be encrypted first, since storing passwords in clear is a bad idea. Finally if our authentication fails the handler calls note_basic_auth_failure() and returns C<Apache::AUTH_REQUIRED>, which sets the proper HTTP response headers that tell the client that its user that the authentication has failed and the credentials should be supplied again. It's not enough to enable this handler for the authentication to work. You have to tell Apache what authentication scheme to use (C<Basic> or C<Digest>), which is specified by the C<AuthType> directive, and you should also supply the C<AuthName> -- the authentication realm, which is really just a string that the client usually uses as a title in the pop-up box, where the username and the password are inserted. Finally the C<Require> directive is needed to specify which usernames are allowed to authenticate. If you set it to C<valid-user> any username will do. Here is the whole configuration section that requires users to authenticate before they are allowed to run the registry scripts from I</perl/>: <Location /perl/> SetHandler perl-script PerlResponseHandler ModPerl::Registry PerlAuthenHandler MyApache::SecretLengthAuth Options +ExecCGI AuthType Basic AuthName "The Gate" Require valid-user </Location> =head2 PerlAuthzHandler The I<auth_checker> (I<authz>) phase is used for authorization control. This phase requires a successful authentication from the previous phase, because a username is needed in order to decide whether a user is authorized to access the requested resource. As this phase is tightly connected to the authentication phase, the handlers registered for this phase are only called when the requested resource is password protected, similar to the auth phase. The handler is expected to return C<Apache::DECLINED> to defer the decision, C<Apache::OK> to indicate its acceptance of the user's authorization, or C<Apache::AUTH_REQUIRED> to indicate that the user is not authorized to access the requested document. This phase is of type C<L<RUN_FIRST|docs::2.0::user::handlers::intro/item_RUN_FIRST>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. Here is the C<MyApache::SecretResourceAuthz> handler which allows an access to certain resources only to certain users who have already properly authenticated: file:MyApache/SecretResourceAuthz.pm ------------------------------------ package MyApache::SecretResourceAuthz; use strict; use warnings; use Apache::Const -compile => qw(OK AUTH_REQUIRED); use constant SECRET_LENGTH => 14; my %protected = ( 'admin' => ['stas'], 'report' => [qw(stas boss)], ); sub handler { my $r = shift; my $user = $r->user; if ($user) { my($section) = $r->uri =~ m|^/company/(\w+)/|; if (my $users = $protected{$section}) { return Apache::OK if grep { $_ eq $user } @$users; } else { return Apache::OK; } } $r->note_basic_auth_failure; return Apache::AUTH_REQUIRED; } 1; This authorization handler is very similar to the authentication handler L<from the previous section|/PerlAuthenHandler>. Here we rely on the previous phase to get users authenticated, and now as we have the username we can make decisions whether to let the user access the resource it has asked for or not. In our example we have a simple hash which maps which users are allowed to access what resources. So for example anything under I</company/admin/> can be accessed only by the user I<stas>, I</company/report/> can be accessed by users I<stas> and I<boss>, whereas any other resources under I</company/> can be accessed by everybody who has reached so far. If for some reason we don't get the username, we or the user is not authorized to access the resource the handler does the same thing as it does when the authentication fails, i.e, calls: $r->note_basic_auth_failure; return Apache::AUTH_REQUIRED; The configuration is similar to the one in L<the previous section|/PerlAuthenHandler>, this time we just add the C<PerlAuthzHandler> setting. The rest doesn't change. Alias /company/ /home/httpd/httpd-2.0/perl/ <Location /company/> SetHandler perl-script PerlResponseHandler ModPerl::Registry PerlAuthenHandler MyApache::SecretPhraseAuth PerlAuthzHandler MyApache::SecretResourceAuthz Options +ExecCGI AuthType Basic AuthName "The Secret Gate" Require valid-user </Location> =head2 PerlTypeHandler The I<type_checker> phase is used to set the response MIME type (C<Content-type>) and sometimes other bits of document type information like the document language. For example C<mod_autoindex>, which performs automatic directory indexing, uses this phase to map the filename extensions to the corresponding icons which will be later used in the listing of files. Of course later phases may override the mime type set in this phase. This phase is of type C<L<RUN_FIRST|docs::2.0::user::handlers::intro/item_RUN_FIRST>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. The most important thing to remember when overriding the default I<type_checker> handler, which is usually the mod_mime handler, is that you have to set the handler that will take care of the response phase and the response callback function or the code won't work. mod_mime does that based on C<SetHandler> and C<AddHandler> directives, and file extensions. So if you want the content handler to be run by mod_perl, set either: $r->handler('perl-script'); $r->set_handlers(PerlResponseHandler => \&handler); or: $r->handler('modperl'); $r->set_handlers(PerlResponseHandler => \&handler); depending on which type of response handler is wanted. Writing a C<PerlTypeHandler> handler which sets the content-type value and returns C<Apache::DECLINED> so that the default handler will do the rest of the work, is not a good idea, because mod_mime will probably override this and other settings. Therefore it's the easiest to leave this stage alone and do any desired settings in the I<fixups> phase. =head2 PerlFixupHandler The I<fixups> phase is happening just before the content handling phase. It gives the last chance to do things before the response is generated. For example in this phase C<mod_env> populates the environment with variables configured with I<SetEnv> and I<PassEnv> directives. This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. The following fixup handler example tells Apache at run time which handler and callback should be used to process the request based on the file extension of the request's URI. file:MyApache/FileExtDispatch.pm -------------------------------- package MyApache::FileExtDispatch; use strict; use warnings; use Apache::Const -compile => 'OK'; use constant HANDLER => 0; use constant CALLBACK => 1; my %exts = ( cgi => ['perl-script', \&cgi_handler], pl => ['modperl', \&pl_handler ], tt => ['perl-script', \&tt_handler ], txt => ['default-handler', undef ], ); sub handler { my $r = shift; my ($ext) = $r->uri =~ /\.(\w+)$/; $ext = 'txt' unless defined $ext and exists $exts{$ext}; $r->handler($exts{$ext}->[HANDLER]); if (defined $exts{$ext}->[CALLBACK]) { $r->set_handlers(PerlHandler => $exts{$ext}->[CALLBACK]); } return Apache::OK; } sub cgi_handler { content_handler($_[0], 'cgi') } sub pl_handler { content_handler($_[0], 'pl') } sub tt_handler { content_handler($_[0], 'tt') } sub content_handler { my($r, $type) = @_; $r->content_type('text/plain'); $r->print("A handler of type '$type' was called"); return Apache::OK; } 1; In the example we have used the following mapping. my %exts = ( cgi => ['perl-script', \&cgi_handler], pl => ['modperl', \&pl_handler ], tt => ['perl-script', \&tt_handler ], txt => ['default-handler', undef ], ); So that I<.cgi> requests will be handled by the C<perl-script> handler and the C<cgi_handler()> callback, I<.pl> requests by C<modperl> and C<pl_handler()>, I<.tt> (template toolkit) by C<perl-script> and the C<tt_handler()>, finally I<.txt> request by the C<default-handler> handler, which requires no callback. Moreover the handler assumes that if the request's URI has no file extension or it does, but it's not in its mapping, the C<default-handler> will be used, as if the I<txt> extension was used. After doing the mapping, the handler assigns the handler: $r->handler($exts{$ext}->[HANDLER]); and the callback if needed: if (defined $exts{$ext}->[CALLBACK]) { $r->set_handlers(PerlHandler => $exts{$ext}->[CALLBACK]); } In this simple example the callback functions don't do much but calling the same content handler which simply prints the name of the extension if handled by mod_perl, otherwise Apache will serve the other files using the default handler. In real world you will use callbacks to real content handlers that do real things. Here is how this handler is configured: Alias /dispatch/ /home/httpd/dispatch/ <Location /dispatch/> PerlFixupHandler MyApache::FileExtDispatch </Location> Notice that there is no need to specify anything, but the fixup handler. It applies the rest of the settings dynamically at run-time. =head2 PerlResponseHandler The I<handler> (I<response>) phase is used for generating the response. This is probably the most important phase and most of the existing Apache modules do most of their work at this phase. This is the only phase that requires two directives under mod_perl. For example: <Location /perl> SetHandler perl-script PerlResponseHandler Apache::Registry </Location> C<SetHandler> set to L<C<perl-script>|docs::2.0::user::config::config/perl_script> or L<C<modperl>|docs::2.0::user::config::config/modperl> tells Apache that mod_perl is going to handle the response generation. C<PerlResponseHandler> tells mod_perl which callback is going to do the job. This phase is of type C<L<RUN_FIRST|docs::2.0::user::handlers::intro/item_RUN_FIRST>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. Most of the C<Apache::> modules on CPAN are dealing with this phase. In fact most of the developers spend the majority of their time working on handlers that generate response content. Let's write a simple response handler, that just generates some content. This time let's do something more interesting than printing I<"Hello world">. Let's write a handler that prints itself: file:MyApache/Deparse.pm ------------------------ package MyApache::Deparse; use strict; use warnings; use Apache::RequestRec (); use Apache::RequestIO (); use B::Deparse (); use Apache::Const -compile => 'OK'; sub handler { my $r = shift; $r->content_type('text/plain'); $r->print('sub handler ', B::Deparse->new->coderef2text(\&handler)); return Apache::OK; } 1; To enable this handler add to I<httpd.conf>: <Location /deparse> SetHandler modperl PerlResponseHandler MyApache::Deparse </Location> Now when the server is restarted and we issue a request to I<http://localhost/deparse> we get the following response: sub handler { package MyApache::Deparse; my $r = shift @_; $r->content_type('text/plain'); $r->print('sub handler ', 'B::Deparse'->new->coderef2text(\&handler)); return 0; } if you compare it to the source code, it's pretty much the same. C<B::Deparse> is fun to play with! =head2 PerlLogHandler The I<log_transaction> phase happens no matter how the previous phases have ended up. If one of the earlier phases has aborted a request, e.g., failed authentication or 404 (file not found) errors, the rest of the phases up to and including the response phases are skipped. But this phase is always executed. By this phase all the information about the request and the response is known, therefore the logging handlers usually record this information in various ways (e.g., logging to a flat file or a database). This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<DIR|docs::2.0::user::config::config/item_DIR>>. Imagine a situation where you have to log requests into individual files, one per user. Assuming that all requests start with I</users/username/>, so it's easy to categorize requests by the second URI path component. Here is the log handler that does that: file:MyApache/LogPerUser.pm --------------------------- package MyApache::LogPerUser; use strict; use warnings; use Apache::RequestRec (); use Apache::Connection (); use Fcntl qw(:flock); use Apache::Const -compile => qw(OK DECLINED); sub handler { my $r = shift; my($username) = $r->uri =~ m|^/users/([^/]+)|; return Apache::DECLINED unless defined $username; my $entry = sprintf qq(%s [%s] "%s" %d %d\n), $r->connection->remote_ip, scalar(localtime), $r->uri, $r->status, $r->bytes_sent; my $log_path = Apache::server_root_relative($r->pool, "logs/$username.log"); open my $fh, ">>$log_path" or die "can't open $log_path: $!"; flock $fh, LOCK_EX; print $fh $entry; close $fh; return Apache::OK; } 1; First the handler tries to figure out what username the request is issued for, if it fails to match the URI, it simply returns C<Apache::DECLINED>, letting other log handlers to do the logging. Though it could return C<Apache::OK> since all other log handlers will be run anyway. Next it builds the log entry, similar to the default I<access_log> entry. It's comprised of remote IP, the current time, the uri, the return status and how many bytes were sent to the client as a response body. Finally the handler appends this entry to the log file for the user the request was issued for. Usually it's safe to append short strings to the file without being afraid of messing up the file, when two files attempt to write at the same time, but just to be on the safe side the handler exclusively locks the file before performing the writing. To configure the handler simply enable the module with the C<PerlLogHandler> directive, inside the wanted section, which was I</users/> in our example: <Location /users/> SetHandler perl-script PerlResponseHandler ModPerl::Registry PerlLogHandler MyApache::LogPerUser Options +ExecCGI </Location> After restarting the server and issuing requests to the following URIs: http://localhost/users/stas/test.pl http://localhost/users/eric/test.pl http://localhost/users/stas/date.pl The C<MyApache::LogPerUser> handler will append to I<logs/stas.log>: 127.0.0.1 [Sat Aug 31 01:50:38 2002] "/users/stas/test.pl" 200 8 127.0.0.1 [Sat Aug 31 01:50:40 2002] "/users/stas/date.pl" 200 44 and to I<logs/eric.log>: 127.0.0.1 [Sat Aug 31 01:50:39 2002] "/users/eric/test.pl" 200 8 =head2 PerlCleanupHandler META: not implemented yet This phase is of type C<XXX>. The handler's configuration scope is C<XXX>. =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.1 modperl-docs/src/docs/2.0/user/handlers/intro.pod Index: intro.pod =================================================================== =head1 NAME Introducing mod_perl Handlers =head1 Description This chapter provides an introduction into mod_perl handlers. =head1 What are Handlers? Apache distinguishes between numerous phases for which it provides hooks (because the C functions are called I<ap_hook_E<lt>phase_nameE<gt>>) where modules can plug various callbacks to extend and alter the default behavior of the webserver. mod_perl provides a Perl interface for most of the available hooks, so mod_perl modules writers can change the Apache behavior in Perl. These callbacks are usually referred to as I<handlers> and therefore the configuration directives for the mod_perl handlers look like: C<PerlFooHandler>, where C<Foo> is one of the handler names. For example C<PerlResponseHandler> configures the response callback. A typical handler is simply a perl package with a I<handler> subroutine. For example: file:MyApache/CurrentTime.pm ---------------------------- package MyApache::CurrentTime; use strict; use warnings; use Apache::RequestRec (); use Apache::RequestIO (); use Apache::Const -compile => qw(OK); sub handler { my $r = shift; $r->content_type('text/plain'); $r->print("Now is: " . scalar(localtime) . "\n"); return Apache::OK; } 1; This handler simply returns the current date and time as a response. Since this is a response handler, we configure it as a such in I<httpd.conf>: PerlResponseHandler MyApache::CurrentTime Since the response handler should be configured for a specific location, let's write a complete configuration section: PerlModule MyApache::CurrentTime <Location /time> SetHandler modperl PerlResponseHandler MyApache::CurrentTime </Location> Now when a request is issued to I<http://localhost/time> this response handler is executed and a response that includes the current time is returned to the client. =head1 mod_perl Handlers Categories The mod_perl handlers can be divided by their application scope in several categories: =over =item * L<Server life cycle|docs::2.0::user::handlers::server/> =over =item * C<L<PerlOpenLogsHandler|docs::2.0::user::handlers::server/PerlOpenLogsHandler>> =item * C<L<PerlPostConfigHandler|docs::2.0::user::handlers::server/PerlPostConfigHandler>> =item * C<L<PerlChildInitHandler|docs::2.0::user::handlers::server/PerlChildInitHandler>> =item * C<L<PerlChildExitHandler|docs::2.0::user::handlers::server/PerlChildExitHandler>> =back =item * L<Protocols|docs::2.0::user::handlers::protocols/> =over =item * C<L<PerlPreConnectionHandler|docs::2.0::user::handlers::protocols/PerlPreConnectionHandler>> =item * C<L<PerlProcessConnectionHandler|docs::2.0::user::handlers::protocols/PerlProcessConnectionHandler>> =back =item * L<Filters|docs::2.0::user::handlers::filters/> =over =item * C<L<PerlInputFilterHandler|docs::2.0::user::handlers::filters/PerlInputFilterHandler>> =item * C<L<PerlOutputFilterHandler|docs::2.0::user::handlers::filters/PerlOutputFilterHandler>> =back =item * L<HTTP Protocol|docs::2.0::user::handlers::http/> =over =item * C<L<PerlPostReadRequestHandler|docs::2.0::user::handlers::http/PerlPostReadRequestHandler>> =item * C<L<PerlTransHandler|docs::2.0::user::handlers::http/PerlTransHandler>> =item * C<L<PerlInitHandler|docs::2.0::user::handlers::http/PerlInitHandler>> =item * C<L<PerlHeaderParserHandler|docs::2.0::user::handlers::http/PerlHeaderParserHandler>> =item * C<L<PerlAccessHandler|docs::2.0::user::handlers::http/PerlAccessHandler>> =item * C<L<PerlAuthenHandler|docs::2.0::user::handlers::http/PerlAuthenHandler>> =item * C<L<PerlAuthzHandler|docs::2.0::user::handlers::http/PerlAuthzHandler>> =item * C<L<PerlTypeHandler|docs::2.0::user::handlers::http/PerlTypeHandler>> =item * C<L<PerlFixupHandler|docs::2.0::user::handlers::http/PerlFixupHandler>> =item * C<L<PerlResponseHandler|docs::2.0::user::handlers::http/PerlResponseHandler>> =item * C<L<PerlLogHandler|docs::2.0::user::handlers::http/PerlLogHandler>> =item * C<L<PerlCleanupHandler|docs::2.0::user::handlers::http/PerlCleanupHandler>> =back =back =head1 Bucket Brigades Apache 2.0 allows multiple modules to filter both the request and the response. Now one module can pipe its output as an input to another module as if another module was receiving the data directly from the TCP stream. The same mechanism works with the generated response. With I/O filtering in place, simple filters, like data compression and decompression, can be easily implemented and complex filters, like SSL, are now possible without needing to modify the the server code which was the case with Apache 1.3. In order to make the filtering mechanism efficient and avoid unnecessary copying, the I<Bucket Brigades> technology was introduced. A bucket represents a chunk of data. Buckets linked together comprise a brigade. Each bucket in a brigade can be modified, removed and replaced with another bucket. The goal is to minimize the data copying where possible. Buckets come in different types, such as files, data blocks, end of stream indicators, pools, etc. To manipulate a bucket one doesn't need to know its internal representation. The stream of data is represented by bucket brigades. When a filter is called it gets passed the brigade that was the output of the previous filter. This brigade is then manipulated by the filter (e.g., by modifying some buckets) and passed to the next filter in the stack. The following figure depicts an imaginary bucket brigade: =for html <img src="bucket_brigades.gif" width="590" height="400" align="center" valign="middle" alt="bucket brigades"><br><br> The figure tries to show that after the presented bucket brigade has passed through several filters some buckets were removed, some modified and some added. Of course the handler that gets the brigade cannot tell the history of the brigade, it can only see the existing buckets in the brigade. Bucket brigades are discussed in detail in the L<connection protocols|docs::2.0::user::handler::protocols> and L<I/O filtering|docs::2.0::user::handler::filters> chapters. =head1 Single Phase's Multiple Handlers Behavior For each phase there can be more than one handler assigned (also known as I<hooks>, because the C functions are called I<ap_hook_E<lt>phase_nameE<gt>>). Phases' behavior varies when there is more then one handler registered to run for the same phase. The following table specifies each handler's behavior in this situation: Directive Type -------------------------------------- PerlOpenLogsHandler RUN_ALL PerlPostConfigHandler RUN_ALL PerlChildInitHandler VOID PerlChildExitHandler XXX PerlPreConnectionHandler RUN_ALL PerlProcessConnectionHandler RUN_FIRST PerlPostReadRequestHandler RUN_ALL PerlTransHandler RUN_FIRST PerlInitHandler RUN_ALL PerlHeaderParserHandler RUN_ALL PerlAccessHandler RUN_ALL PerlAuthenHandler RUN_FIRST PerlAuthzHandler RUN_FIRST PerlTypeHandler RUN_FIRST PerlFixupHandler RUN_ALL PerlResponseHandler RUN_FIRST PerlLogHandler RUN_ALL PerlCleanupHandler XXX PerlInputFilterHandler VOID PerlOutputFilterHandler VOID And here is the description of the possible types: =over =item * VOID Handlers of the type C<VOID> will be I<all> executed in the order they have been registered disregarding their return values. Though in mod_perl they are expected to return C<Apache::OK>. =item * RUN_FIRST Handlers of the type C<RUN_FIRST> will be executed in the order they have been registered until the first handler that returns something other than C<Apache::DECLINED>. If the return value is C<Apache::DECLINED>, the next handler in the chain will be run. If the return value is C<Apache::OK> the next phase will start. In all other cases the execution will be aborted. =item * RUN_ALL Handlers of the type C<RUN_ALL> will be executed in the order they have been registered until the first handler that returns something other than C<Apache::OK> or C<Apache::DECLINED>. =back For C API declarations see I<include/ap_config.h>, which includes other types which aren't exposed by mod_perl handlers. Also see L<mod_perl Directives Argument Types and Allowed Location|docs::2.0::user::config::config/mod_perl_Directives_Argument_Types_and_Allowed_Location> =head1 Hook Ordering (Position) The following constants specify how the new hooks (handlers) are inserted into the list of hooks when there is at least one hook already registered for the same phase. META: need to verify the following: =over =item * C<APR::HOOK_REALLY_FIRST> run this hook first, before ANYTHING. =item * C<APR::HOOK_FIRST> run this hook first. =item * C<APR::HOOK_MIDDLE> run this hook somewhere. =item * C<APR::HOOK_LAST> run this hook after every other hook which is defined. =item * C<APR::HOOK_REALLY_LAST> run this hook last, after EVERYTHING. =back META: more information in mod_example.c talking about position/predecessors, etc. =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.1 modperl-docs/src/docs/2.0/user/handlers/protocols.pod Index: protocols.pod =================================================================== =head1 NAME Protocol Handlers =head1 Description This chapter explains how to implement Protocol (Connection) Handlers in mod_perl. =head1 Connection Cycle Phases As we saw earlier, each child server (be it a thread or a process) is engaged in processing connections. Each connection may be served by different connection protocols, e.g., HTTP, POP3, SMTP, etc. Each connection may include more then one request, e.g., several HTTP requests can be served over a single connection, when a response includes several images. The following diagram depicts the connection life cycle and highlights which handlers are available to mod_perl 2.0: =for html <img src="connection_cycle.gif" width="598" height="498" align="center" valign="middle" alt="connection cycle"><br><br> When a connection is issued by a client, it's first run through C<PerlPreConnectionHandler> and then passed to the C<PerlProcessConnectionHandler>, which generates the response. When C<PerlProcessConnectionHandler> is reading data from the client, it can be filtered by connection input filters. The generated response can be also filtered though connection output filters. Filters are usually used for modifying the data flowing though them, but can be used for other purposes as well (e.g., logging interesting information). Now let's discuss each of the C<PerlPreConnectionHandler> and C<PerlProcessConnectionHandler> handlers in detail. =head2 PerlPreConnectionHandler The I<pre_connection> phase happens just after the server accepts the connection, but before it is handed off to a protocol module to be served. It gives modules an opportunity to modify the connection as soon as possible and insert filters if needed. The core server uses this phase to setup the connection record based on the type of connection that is being used. mod_perl itself uses this phase to register the connection input and output filters. In mod_perl 1.0 during code development C<Apache::Reload> was used to automatically reload modified since the last request Perl modules. It was invoked during C<post_read_request>, the first HTTP request's phase. In mod_perl 2.0 I<pre_connection> is the earliest phase, so if we want to make sure that all modified Perl modules are reloaded for any protocols and its phases, it's the best to set the scope of the Perl interpreter to the lifetime of the connection via: PerlInterpScope connection and invoke the C<Apache::Reload> handler during the I<pre_connection> phase. However this development-time advantage can become a disadvantage in production--for example if a connection, handled by HTTP protocol, is configured as C<KeepAlive> and there are several requests coming on the same connection and only one handled by mod_perl and the others by the default images handler, the Perl interpreter won't be available to other threads while the images are being served. This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<SRV|docs::2.0::user::config::config/item_SRV>>, because it's not known yet which resource the request will be mapped to. XXX: As of this moment C<PerlPreConnectionHandler> is not being executed by mod_perl. Stay tuned. Example: A I<pre_connection> handler accepts connection record and socket objects as its arguments: sub handler { my ($c, $socket) = @_; # ... return Apache::OK; } =head2 PerlProcessConnectionHandler The I<process_connection> phase is used to process incoming connections. Only protocol modules should assign handlers for this phase, as it gives them an opportunity to replace the standard HTTP processing with processing for some other protocols (e.g., POP3, FTP, etc.). This phase is of type C<L<RUN_FIRST|docs::2.0::user::handlers::intro/item_RUN_FIRST>>. The handler's configuration scope is C<L<SRV|docs::2.0::user::config::config/item_SRV>>. Therefore the only way to run protocol servers different than the core HTTP is inside dedicated virtual hosts. A I<process_connection> handler accepts a connection record object as its only argument, a socket object can be retrieved from the connection record object. sub handler { my ($c) = @_; my $socket = $c->client_socket; # ... return Apache::OK; } Now let's look at the following two examples of connection handlers. The first using the connection socket to read and write the data and the second using bucket brigades to accomplish the same and allow for connection filters to do their work. =head3 Socket-based Protocol Module To demonstrate the workings of a protocol module, we'll take a look at the C<MyApache::EchoSocket> module, which simply echoes the data read back to the client. In this module we will use the implementation that works directly with the connection socket and therefore bypasses connection filters if any. A protocol handler is configured using the C<PerlProcessConnectionHandler> directive and we will use the C<Listen> and C<E<lt>VirtualHostE<gt>> directives to bind to the non-standard port B<8010>: Listen 8010 <VirtualHost _default_:8010> PerlModule MyApache::EchoSocket PerlProcessConnectionHandler MyApache::EchoSocket </VirtualHost> C<MyApache::EchoSocket> is then enabled when starting Apache: panic% httpd And we give it a whirl: panic% telnet localhost 8010 Trying 127.0.0.1... Connected to localhost (127.0.0.1). Escape character is '^]'. Hello Hello fOo BaR fOo BaR Connection closed by foreign host. Here is the code: file:MyApache/EchoSocket.pm ------------------ package MyApache::EchoSocket; use strict; use warnings FATAL => 'all'; use Apache::Connection (); use APR::Socket (); use Apache::Const -compile => 'OK'; use constant BUFF_LEN => 1024; sub handler { my $c = shift; my $socket = $c->client_socket; my $buff; while (1) { my($rlen, $wlen); $rlen = BUFF_LEN; $socket->recv($buff, $rlen); last if $rlen <= 0 or $buff =~ /^[\r\n]+$/; $wlen = $rlen; $socket->send($buff, $wlen); last if $wlen != $rlen; } Apache::OK; } 1; The example handler starts with the standard I<package> declaration and of course, C<use strict;>. As with all C<Perl*Handler>s, the subroutine name defaults to I<handler>. However, in the case of a protocol handler, the first argument is not a C<request_rec>, but a C<conn_rec> blessed into the C<Apache::Connection> class. We have direct access to the client socket via C<Apache::Connection>'s I<client_socket> method. This returns an object blessed into the C<APR::Socket> class. Inside the read/send loop, the handler attempts to read C<BUFF_LEN> bytes from the client socket into the C<$buff> buffer. The C<$rlen> parameter will be set to the number of bytes actually read. The C<APR::Socket::recv()> method returns an APR status value, be we need only check the read length to break out of the loop if it is less than or equal to C<0> bytes. The handler also breaks the loop after processing an input including nothing but new lines characters, which is how we abort the connection in the interactive mode. If the handler receives some data, it sends it unmodified back to the client with the C<APR::Socket::send()> method. When the loop is finished the handler returns C<Apache::OK>, telling Apache to terminate the connection. As mentioned earlier since this handler is working directly with the connection socket, no filters can be applied. =head3 Bucket Brigades-based Protocol Module Now let's look at the same module, but this time implemented by manipulating bucket brigades, and which runs its output through a connection output filter that turns all uppercase characters into their lowercase equivalents. The following configuration defines a virtual host listening on port 8011 and which enables the C<MyApache::EchoBB> connection handler, which will run its output through C<MyApache::EchoBB::lowercase_filter> filter: Listen 8011 <VirtualHost _default_:8011> PerlModule MyApache::EchoBB PerlProcessConnectionHandler MyApache::EchoBB PerlOutputFilterHandler MyApache::EchoBB::lowercase_filter </VirtualHost> As before we start the httpd server: panic% httpd And try the new connection handler in action: panic% telnet localhost 8011 Trying 127.0.0.1... Connected to localhost (127.0.0.1). Escape character is '^]'. Hello hello fOo BaR foo bar Connection closed by foreign host. As you can see the response which is now was all in lower case, because of the output filter. And here is the implementation of the connection and the filter handlers. file:MyApache/EchoBB.pm ----------------------- package MyApache::EchoBB; use strict; use warnings FATAL => 'all'; use Apache::Connection (); use APR::Bucket (); use APR::Brigade (); use APR::Util (); use APR::Const -compile => qw(SUCCESS EOF); use Apache::Const -compile => qw(OK MODE_GETLINE); sub handler { my $c = shift; my $bb_in = APR::Brigade->new($c->pool, $c->bucket_alloc); my $bb_out = APR::Brigade->new($c->pool, $c->bucket_alloc); my $last = 0; while (1) { my $rv = $c->input_filters->get_brigade($bb_in, Apache::MODE_GETLINE); if ($rv != APR::SUCCESS or $bb_in->empty) { my $error = APR::strerror($rv); unless ($rv == APR::EOF) { warn "get_brigade: $error\n"; } $bb_in->destroy; last; } while (!$bb_in->empty) { my $bucket = $bb_in->first; $bucket->remove; if ($bucket->is_eos) { $bb_out->insert_tail($bucket); last; } my $data; my $status = $bucket->read($data); return $status unless $status == APR::SUCCESS; if ($data) { $last++ if $data =~ /^[\r\n]+$/; # could do something with the data here $bucket = APR::Bucket->new($data); } $bb_out->insert_tail($bucket); } my $b = APR::Bucket::flush_create($c->bucket_alloc); $bb_out->insert_tail($b); $c->output_filters->pass_brigade($bb_out); last if $last; } Apache::OK; } use base qw(Apache::Filter); use constant BUFF_LEN => 1024; sub lowercase_filter : FilterConnectionHandler { my $filter = shift; while ($filter->read(my $buffer, BUFF_LEN)) { $filter->print(lc $buffer); } return Apache::OK; } 1; For the purpose of explaining how this connection handler works, we are going to simplify the handler. The whole handler can be represented by the following pseudo-code: while ($bb_in = get_brigade()) { while ($bucket_in = $bb_in->get_bucket()) { my $data = $bucket_in->read(); # do something with data $bucket_out = new_bucket($data); $bb_out->insert_tail($bucket_out); } $bb_out->insert_tail($flush_bucket); pass_brigade($bb_out); } The handler receives the incoming data via bucket bridges, one at a time in a loop. It then process each bridge, by retrieving the buckets contained in it, reading the data in, then creating new buckets using the received data, and attaching them to the outgoing brigade. When all the buckets from the incoming bucket brigade were transformed and attached to the outgoing bucket brigade, a flush bucket is created and added as the last bucket, so when the outgoing bucket brigade is passed out to the outgoing connection filters, it won't be buffered but sent to the client right away. If you look at the complete handler, the loop is terminated when one of the following conditions occurs: an error happens, the end of stream bucket has been seen (no more input at the connection) or when the received data contains nothing but new line characters which we used to to tell the server to terminate the connection. Notice that this handler could be much simpler, since we don't modify the data. We could simply pass the whole brigade unmodified without even looking at the buckets. But from this example you can see how to write a connection handler where you actually want to read and/or modify the data. To accomplish that modification simply add a code that transforms the data which has been read from the bucket before it's inserted to the outgoing brigade. We will skip the filter discussion here, since we are going to talk in depth about filters in the dedicated to filters sections. But all you need to know at this stage is that the data sent from the connection handler is filtered by the outgoing filter and which transforms it to be all lowercase. =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.1 modperl-docs/src/docs/2.0/user/handlers/server.pod Index: server.pod =================================================================== =head1 NAME Server Life Cycle Handlers =head1 Description This chapter discusses server life cycle and the mod_perl handlers participating in it. =head1 Server Life Cycle The following diagram depicts the Apache 2.0 server life cycle and highlights which handlers are available to mod_perl 2.0: =for html <img src="server_life_cycle.gif" width="561" height="537" align="center" valign="middle" alt="server life cycle"><br><br> Apache 2.0 starts by parsing the configuration file. After the configuration file is parsed, the C<PerlOpenLogsHandler> handlers are executed if any. After that it's a turn of C<PerlPostConfigHandler> handlers to be run. When the I<post_config> phase is finished the server immediately restarts, to make sure that it can survive graceful restarts after starting to serve the clients. When the restart is completed, Apache 2.0 spawns the workers that will do the actual work. Depending on the used MPM, these can be threads, processes and a mixture of both. For example the I<worker> MPM spawns a number of processes, each running a number of threads. When each child process is started C<PerlChildInit> handlers are executed. Notice that they are run for each starting process, not a thread. From that moment on each working thread processes connections until it's killed by the server or the server is shutdown. =head2 Startup Phases Demonstration Module Let's look at the following example that demonstrates all the startup phases: file:MyApache/StartupLog.pm --------------------------- package MyApache::StartupLog; use strict; use warnings; use Apache::Log (); use File::Spec::Functions; use Apache::Const -compile => 'OK'; my $log_file = catfile "logs", "startup_log"; my $log_fh; sub open_logs { my($conf_pool, $log_pool, $temp_pool, $s) = @_; my $log_path = Apache::server_root_relative($conf_pool, $log_file); $s->warn("opening the log file: $log_path"); open $log_fh, ">>$log_path" or die "can't open $log_path: $!"; my $oldfh = select($log_fh); $| = 1; select($oldfh); say("process $$ is born to reproduce"); return Apache::OK; } sub post_config { my($conf_pool, $log_pool, $temp_pool, $s) = @_; say("configuration is completed"); return Apache::OK; } sub child_init { my($child_pool, $s) = @_; say("process $$ is born to serve"); return Apache::OK; } sub say { my($caller) = (caller(1))[3] =~ /([^:]+)$/; printf $log_fh "[%s] - %-11s: %s\n", scalar(localtime), $caller, $_[0]; } END { say("process $$ is shutdown\n"); } 1; And the I<httpd.conf> configuration section: PerlModule MyApache::StartupLog PerlOpenLogsHandler MyApache::StartupLog::open_logs PerlPostConfigHandler MyApache::StartupLog::post_config PerlChildInitHandler MyApache::StartupLog::child_init When we perform a server startup followed by a shutdown, the I<logs/startup_log> is created if it didn't exist already (it shares the same directory with I<error_log> and other standard log files), and each stage appends to it its log information. So when we perform: % bin/apachectl start && bin/apachectl stop the following is getting logged to I<logs/startup_log>: [Thu Aug 22 15:57:08 2002] - open_logs : process 21823 is born to reproduce [Thu Aug 22 15:57:08 2002] - post_config: configuration is completed [Thu Aug 22 15:57:09 2002] - END : process 21823 is shutdown [Thu Aug 22 15:57:10 2002] - open_logs : process 21825 is born to reproduce [Thu Aug 22 15:57:10 2002] - post_config: configuration is completed [Thu Aug 22 15:57:11 2002] - child_init : process 21830 is born to serve [Thu Aug 22 15:57:11 2002] - child_init : process 21831 is born to serve [Thu Aug 22 15:57:11 2002] - child_init : process 21832 is born to serve [Thu Aug 22 15:57:11 2002] - child_init : process 21833 is born to serve [Thu Aug 22 15:57:12 2002] - END : process 21825 is shutdown First of all, we can clearly see that Apache always restart itself after the first I<post_config> phase is over. The logs show that the I<post_config> phase is preceded by the I<open_logs> phase. Only after Apache has restarted itself and has completed the I<open_logs> and I<post_config> phase again the I<child_init> phase is run for each child process. In our example we have had the setting C<StartServers=4>, therefore you can see four child processes were started. Finally you can see that on server shutdown the END {} block has been executed by the parent server only. Apache also specifies the I<pre_config> phase, which is executed before the configuration files are parsed, but this is of no use to mod_perl, because mod_perl is loaded only during the configuration phase. Now let's discuss each of the mentioned startup handlers and their implementation in the C<MyApache::StartupLog> module in detail. =head2 PerlOpenLogsHandler The I<open_logs> phase happens just before the I<post_config> phase. Handlers registered by C<PerlOpenLogsHandler> are usually used for opening module-specific log files. At this stage the C<STDERR> stream is not yet redirected to I<error_log>, and therefore any messages to that stream will be printed to the console the server is starting from (if such exists). This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<SRV|docs::2.0::user::config::config/item_SRV>>. As we have seen in the C<MyApache::StartupLog::open_logs> handler, the I<open_logs> phase handlers accept four arguments: the configuration pool, the logging streams pool, the temporary pool and the server object: sub open_logs { my($conf_pool, $log_pool, $temp_pool, $s) = @_; my $log_path = Apache::server_root_relative($conf_pool, $log_file); $s->warn("opening the log file: $log_path"); open $log_fh, ">>$log_path" or die "can't open $log_path: $!"; my $oldfh = select($log_fh); $| = 1; select($oldfh); say("process $$ is born to reproduce"); return Apache::OK; } In our example the handler uses the function C<Apache::server_root_relative()> to set the full path to the log file, which is then opened for appending and set to unbuffered mode. Finally it logs the fact that it's running in the parent process. As you've seen in the example this handler is configured by adding to I<httpd.conf>: PerlOpenLogsHandler MyApache::StartupLog::open_logs =head2 PerlPostConfigHandler The I<post_config> phase happens right after Apache has processed the configuration files, before any child processes were spawned (which happens at the I<child_init> phase). This phase can be used for initializing things to be shared between all child processes. You can do the same in the startup file, but in the I<post_config> phase you have an access to a complete configuration tree. META: once mod_perl will have the API for that. This phase is of type C<L<RUN_ALL|docs::2.0::user::handlers::intro/item_RUN_ALL>>. The handler's configuration scope is C<L<SRV|docs::2.0::user::config::config/item_SRV>>. In our C<MyApache::StartupLog> example we used the I<post_config()> handler: sub post_config { my($conf_pool, $log_pool, $temp_pool, $s) = @_; say("configuration is completed"); return Apache::OK; } As you can see, its arguments are identical to the I<open_logs> phase's handler. In this example handler we don't do much but logging that the configuration was completed and returning right away. As you've seen in the example this handler is configured by adding to I<httpd.conf>: PerlOpenLogsHandler MyApache::StartupLog::post_config =head2 PerlChildInitHandler The I<child_init> phase happens immediately after the child process is spawned. Each child process (not a thread!) will run the hooks of this phase only once in their life-time. In the prefork MPM this phase is useful for initializing any data structures which should be private to each process. For example C<Apache::DBI> pre-opens database connections during this phase and C<Apache::Resource> sets the process' resources limits. This phase is of type C<L<VOID|docs::2.0::user::handlers::intro/item_VOID>>. The handler's configuration scope is C<L<SRV|docs::2.0::user::config::config/item_SRV>>. In our C<MyApache::StartupLog> example we used the I<child_init()> handler: sub child_init { my($child_pool, $s) = @_; say("process $$ is born to serve"); return Apache::OK; } The I<child_init()> handler accepts two arguments: the child process pool and the server object. The example handler logs the pid of the child process it's run in and returns. As you've seen in the example this handler is configured by adding to I<httpd.conf>: PerlOpenLogsHandler MyApache::StartupLog::child_init =head2 PerlChildExitHandler META: not implemented yet =head1 Maintainers Maintainer is the person(s) you should contact with updates, corrections and patches. =over =item * Stas Bekman E<lt>stas (at) stason.orgE<gt> =back =head1 Authors =over =item * =back Only the major authors are listed above. For contributors see the Changes file. =cut 1.28 +1 -1 modperl-docs/src/docs/2.0/user/install/install.pod Index: install.pod =================================================================== RCS file: /home/cvs/modperl-docs/src/docs/2.0/user/install/install.pod,v retrieving revision 1.27 retrieving revision 1.28 diff -u -r1.27 -r1.28 --- install.pod 25 Aug 2002 16:20:52 -0000 1.27 +++ install.pod 2 Sep 2002 06:34:51 -0000 1.28 @@ -4,7 +4,7 @@ =head1 Description -This chapter provides an indepth mod_perl 2.0 installation coverage. +This chapter provides an in-depth mod_perl 2.0 installation coverage. =head1 Prerequisites
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]