Repost with code: Newbie mysterious 500 error with PerlFixupHandlers

judy selen Mon, 05 Feb 2001 08:06:33 -0800

Sorry I forgot to include these files with the earlier message.
Does anyone have any clues with this problem.
I have searched the archives for internal server errors and
PerlFixupHandlers 
and found no similar problems. Perhaps this is a mod_dir issue.
Judy


judy selen wrote:
> 
>   We are using
>   RedHat Linux apache 1.3.14 mod_perl_1.23_01
> 
>   Our site is set up to generate pages using a templating system to add
>   custom headers footers and sidebars. As well we use .htaccess files to
>   further customize the subprocess_env.
> 
>   We are using custom modules installed as PerlFixupHandlers both for
>  template generation and multilingual negotiation
> 
> Negotiation.pm  handles lingual content negotiation. It determines
>  what version of the page to present based on the users preferred
>  variants and the available variants for that page.When the best match
> is found, it
> performs an internal redirect to that document.
> 
> Template.pm is installed as a fixup handler, and is controlled through
> various PerlSetVar directives.
>  Basically, it determines what to include at the beginning and end of
> the web page.
> fixup sets the content handler for the request.
>  Also has two subroutines, handle_rec for static pages ,and handle_cgi
> for scripts These allow the auto-inclusion of headers and footers of web
> pages.
> 
>  In httpd.conf
> --------------
>   DirectoryIndex index.cgi index.html
> 
>   PerlRequire /www/perl/libs/WRI/Template.pm
>   PerlRequire /www/perl/libs/WRI/Negotiation.pm
> 
>   PerlFixupHandler  WRI::Negotiation WRI::Template::fixup
> ---------------------------------------------------
> 
>    These modules work 90% of the time....
> 
>   The problem occurs when a directory is requested, and served as a
>  negotiated document
>   and is passed to our content handler. (handle_cgi)
>   A save-as window pops up in the browser and Apache logs a 500
>   error. (The file it is trying to save is the standard 500 internal
> error message.)
>   In the Template::fixup handler, the
>  document is being detected as a cgi-script, not a server-parsed
>  document.  The 500 occurs because we are trying to execute a static
>  document or directory.
> 
>  Template.pm
>  In run_cgi:
> 
>   unless (open(CGI,'-|')) {
>          #don't need the writing handle to the pipe anymore
>          close(WR);
> 
>          #make STDIN be the read handle of the pipe (so we can get
>          #POST information from Apache)
>          open(STDIN,'<&RD');
> 
>          #Run in the directory of the file
>          $r->chdir_file($filename);
> 
> ### This returns 500 when a directory or static file is negotiated ###
> 
>          #Run the file
>          exec($filename) || return SERVER_ERROR;
>      }
> 
> 
> Error Log:
> 
>  [Wed Jan 31 13:04:52 2001] [error] Can't use string ("500") as a symbol
>   ref while "strict refs" in use at /www/perl/libs/WRI/Template.pm line
>   302.
> 
>   [Wed Jan 31 13:04:55 2001] [error] Usage: Apache::cgi_header_out(r,
>  key,
>   ...) at /www/perl/libs/WRI/Template.pm line 433.
> 
>   These lines numbers may not be accurate since we have added and
> deleted
>   debugging code..
> 
> 
>   Template.pm - handle_cgi:
> 
>     #FIXME - needs error checking.  If run CGI fails, $cgi will have
>      #        the number 500 instead of a file handle, and will fail
>  later on
> 
>       my $cgi = run_cgi($r, $r->filename(), $r->uri());
> 
>   line 302:
>    while (<$cgi>) {
> 
>           #remove the trailing body and html tags from the CGI output
>           #if we are using includes
>           if ($addincs) {
> 
>               s!</(body|html)[^>]*>!!gio;
>           }
> 
>           #print out the line that we just read
>           $r->print($_);
>       }
> 
> .
> .
> .
> 
>     do {
>               #Quit if this is the end of the headers
>               last if ($_ eq $/);
> 
>               #remove the LF or CRLF
>               chomp;
> 
>               #Give apache the header line
>               unless ($is_sub) {
>     line 433:      $r->cgi_header_out(split(/:\s*/,$_,2));
>               }
>           } while (<CGI>);
> 
> 
>   ---------------------------------------------
> 
>   My understanding is that when OK or DECLINED is returned from these
>   modules in the list of fixup handlers, control passes to the next,
>   and when DONE is returned, the remaining  are
>   skipped mod_dir regains control and the next default file is requested
>   (index.cgi index.html)
> 
>   Is this correct or is there something I am misunderstanding.
> 
> Request Outline
> ---------------
> mod_dir sends directory request to Negotiation.pm
> 
> /products/student/calcwiz/
> returns OK from Nego.pm
> mod_dir then sends index.cgi - index.html
> Nego.pm negotiates index.cgi and each of available variations
>     then index.html ......
> 
> when found most appropriate
> Negotiating:
> /products/student/calcwiz/index.en.html
> 
> Template.pm
>  handling request..
> /products/student/calcwiz/index.en.html
> 
>   Another Clue
>   This error recurs when the page is reloaded 25 - 30 times
>   MaxRequestsPerChild is 30
>  Could this be related to our problem.
> 
>  I appreciate any clues anyone may have to why this problem occurs.
>  Thanks in advance,
>  Judy

package WRI::Template;
use strict;
use Apache::Constants qw(:common :methods);
use Apache::Log;
use Apache::File;

#PURPOSE:  This package allows the auto-inclusion of headers and footers
#          of web pages.  It is installed as a fixup handler, and is
#          controlled through various PerlSetVar directives
#

#CONFIGURATION: This module uses several PerlSetVar directives for 
#               configuration.  They are:
#                  PerlSetVar AutoIncludes on
#                  PerlSetVar IncludeHeader filename
#                  PerlSetVar IncludeFooter filename
#                  PerlSetVar handle:handlername
#                  PerlSetVar NoCache 1
#
#               These directives are described in the POD document at
#               the end of this file
#
#               Also, a -head file is used during the inclusion process
#

#BASIC OUTLINE: Normally, you would think that this type of module
#               would be installed as a Content Handler.  However,
#               then it would have to handle all the requests _and_
#               subrequests made (for the header, body, and footer).
#               As a fixup handler, it can simply check to see if it
#               is the main request and, if so, execute, otherwise
#               it will pass control on to the modules that normally
#               handle such requests.
#
#               If it detects that this is the main request, and that
#               includes are turned on.  Then it checks
#               what the content handler was guessed to be.  It then
#               checks the PerlSetVar handle: directives, to see if
#               it is supposed to handle requests of this type.  
#               If so, it loads its own content handler instead of the
#               one Apache thought it should use.  It loads a different
#               handler for CGI scripts than for static pages.
#
#FIXME - maybe whether or not includes are used should be based on the
#        content type and not necessarily the handler.  Just a thought
#

my $DEBUG = 0;
my $handlers = {'cgi-script' => \&handle_cgi,
                'server-parsed' => \&handle_req,
                '*' => \&handle_cgi};

sub fixup {
    my $r = shift; 

    #check to see that includes are turned on, this is the main request,
    #and we are supposed to handle requests of this type
    unless ($r->is_main() &&
            lc($r->subprocess_env('AutoIncludes')) eq 'on' &&
            $r->subprocess_env('handle:'.$r->handler()) &&
            !($r->notes->{'NoTemplate'})) {
        return DECLINED;
    }

    #FIXME - I don't think this is used anywhere anymore.  This is
    #        now accomplished by the EnvFixup module.  This is used
    #        to tell what the main file is being generated.
    $r->subprocess_env('MAIN_SCRIPT_FILENAME', $r->filename());

    #Set the content handler to be this handler.  We have two handlers,
    #a CGI one and one for everything else.  CGI's have to be handled 
    #differently because they can have different titles and meta tags 
    #depending on how they are executed, while HTML pages can simply have
    #their titles read directly from a file

    #Push handlers causes lots of problems in other things (most notably
    #Apache::Registry scripts).  That's why we use set_handlers.

    # This stuff below allows us to use a 'handle-as:support-faq server-parsed'
    # sort of thing (means handle the 'support-faq' handler as the equivalent to
    # what the template module would do with server-parsed content.

    my $k;
    foreach $k (keys %{$r->subprocess_env})
    {
        my $handler;
        next
            unless ($handler) = $k =~ /^handle-as:(.+)/;

        $handlers->{$handler} = $handlers->{$r->subprocess_env($k)};
    }

    $r->set_handlers(PerlHandler => [$handlers->{$r->handler()}
                                     || $handlers->{'*'}]);

    $r->handler('perl-script');
    
    #delete the environment at the end of the request (this may happen
    #anyway)
    $r->register_cleanup(sub { undef(%ENV); });
    
    return OK;
}

#PURPOSE:  This subroutine handles the template module for static pages
#
sub handle_req {
    my $r = shift;

    my $log = $r->log();

    #Make sure the file exists.  If not, log an error, and return NOT FOUND
    if (! (-e $r->filename())) {
        $log->error("Url does not exist: ".$r->uri());
        return NOT_FOUND;
    }

    #FIXME - 
    #We need to disallow appending any PATH_INFO to regular 
    #requests because it messes up URLs a lot.  However, sometimes
    #valid URLs ending in a "/" get that slash added to
    #$r->path_info() (not sure why).  However, we need to do something
    #(maybe redirect?) to keep people from putting in URLs with
    #extra slashes, because it messes LOTS of things up.

    #get subrequests without running them, and update the timestamp
    #in case it's a HEAD request
    my $subr_header = 
        include(
                $r, 
                $r->subprocess_env('IncludeHeader'), 
                q(NO_RUN)
        );
    $r->update_mtime((stat $subr_header->filename())[9]);

    my $subr_footer = 
        include(
                $r, 
                $r->subprocess_env('IncludeFooter'), 
                q(NO_RUN)
        );
    $r->update_mtime((stat $subr_footer->filename())[9]);

    #update the timestamp with the current file name
    $r->update_mtime((stat $r->filename())[9]);
    
    #Set last modified and cacheing(ETAG) information unless NoCache is set.
    #NoCache is used for our pages with Server Side Includes that change a lot
    unless ($r->subprocess_env('NoCache'))
    {
        $r->set_last_modified();
        $r->set_etag();
    }

    #Not sure what this does... Chris added it.  I think it has something
    #to do with the HTTP/1.1 protocol, but not sure (like, for a
    #get-if-not-modified-since or whatever that is)
    if ((my $rc = $r->meets_conditions()) != OK) {
        return $rc;
    }

    #send the header
    my $charset = $r->subprocess_env('WRI_CHARSET');
    $r->send_http_header('text/html' 
                         . ($charset ? "; charset=$charset" : ''));
        
    #we're done if they only asked for the header
    return OK if ($r->header_only());

    #Print the page introduction
    print_head_html($r);

    #Read all data that goes between the <head></head> tags from
    #the -head file
    my $head = join('-', $r->filename(), 'head');
    if (open(FH, $head)) {
        $r->send_fd(\*FH);
        close(FH);
    } else {
        $r->print("<title>Wolfram Research, Inc.</title>\n");
    }
    
    #end the header unless configuration requests otherwise
    $r->print("\n</head>\n") unless $r->subprocess_env('KeepHeadOpen');

    #run the header 
    $subr_header->run();
    #run the body
    include($r, $r->uri());
    #run the footer
    $subr_footer->run();

    #FIXME - should this be here or in the templates?  Probably here
    #        because we tack on the beginning <html> tag
    $r->print("\n</html>\n");

    return OK;
}

#PURPOSE:  This handles execution of CGI scripts under the template module.  
#          More work may have to be done to get this to work in C because 
#          mod_perl handles a lot of the environment handling tasks for
#          us here.
sub handle_cgi {
    my $r = shift;

    my $buf;  #buffer for reading in HTML header

    my $log = $r->log();

    #Return NOT FOUND if the CGI script does not exist
    unless (-e $r->filename()) {
        $log->error("Url does not exist: ".$r->uri());
        return NOT_FOUND;
    }

    #Get the CGI file handle.  The run_cgi procedure does a lot of the
    #grunt work for handling requests (forking, sending headers, 
    #transferring POST data, etc)
    #
    #FIXME - needs error checking.  If run CGI fails, $cgi will have 
    #        the number 500 instead of a file handle, and will fail later on
    my $cgi = run_cgi($r, $r->filename(), $r->uri());

    #only add includes
    #     if the CGI execution is successful
    #     and the CGI is spitting out HTML
    my $addincs = (($r->status >= 200) && ($r->status < 300) && 
                   ($r->content_type() eq 'text/html'));


# Browsers suck because they don't follow the spec 
# and so setting Expires: $x Date: $x does not work because
# the browser does not listen to Date: when deciding when Expires:
# occurs (cknite)
# UPDATE - Actually, I think it is only supposed to work if the browser 
#          identifies itself as HTTP/1.1.  Maybe we could re-enable this
#          for HTTP/1.1 browsers (johnnyb)
#   $r->no_cache(1);

    #send the header
    $r->send_http_header();

    #we're done if they only want the header
    return OK if ($r->header_only());

    #If we are using includes, print out the beginning HTML and the <head>
    #information from the CGI, and include the header
    if ($addincs) {
        $r->print("<html>\n<head>\n");

        #look through the CGI output, and print out everything between
        #<head> and </head>.  This will probably be the most difficult thing
        #to do in C.  Also, strip out the <html>, <head>, </head>, and 
        #<body> tags as well.  These can occur anywhere
        #within a line.  The <body> will be the hardest because the tag
        #itself can spread multiple lines.
        while (<$cgi>) {
            s!<html[^>]*>!!gio;
            s!</?head[^>]*>!!gio;
            my $body = s!<body.*?>(.*)!!io;
            $buf .= $_;
            if ($body) {
                $r->print($buf);
                $buf = $1;
                last;
            }
        }

        #Close the <head> tag unless otherwise specified by the configuration
        $r->print("\n</head>\n") unless $r->subprocess_env('KeepHeadOpen');

        include($r, $r->subprocess_env('IncludeHeader'));
        
        #print the rest of the buffer read from the CGI (we do it now,
        #because it should only come after the includes.  This will print
        #everything after the body tag that lies on the same or following
        #line of the body tag
        $r->print($buf) if ($buf);
    }

    #Print out the contents of the CGI
    while (<$cgi>) {

        #remove the trailing body and html tags from the CGI output
        #if we are using includes
        if ($addincs) {
            s!</(body|html)[^>]*>!!gio;
        }

        #print out the line that we just read
        $r->print($_);
    }

    #If we are using includes, put in the footer, and print the ending
    #</html>
    if ($addincs) {
        include($r, $r->subprocess_env('IncludeFooter'));
        $r->print("</html>\n");
    }

    #We're done!
    return OK;
}


#PURPOSE:  Opens a filehandle to a CGI script, also parses the CGI headers
#          into the request object
#
#NOTE:     In C this will have to use many functions from util_script.c
#          in the apache source distribution
#
sub run_cgi {
    my $r = shift;

    my(
       $filename,  #name of the executable
       $uri,       #URI requested by the browser
       $is_sub     #not sure what this is for.  It is always sent as FALSE
    ) = @_;

    local($|);
    $| = 1;

    #Fix mod_perl's junk (mod_perl inappropriately sets the 
    #GATEWAY_INTERFACE on non-Apache::Registry scripts.
    my $old_gw = $ENV{GATEWAY_INTERFACE};
    $ENV{GATEWAY_INTERFACE} = 'CGI/1.1';
    

    #Open up a pipe.  This will be used to give input to the CGI
    pipe(RD,WR) || return SERVER_ERROR;

    #This causes a fork, with the standard output of the forked process
    #going to the CGI handle.  The code withing the braces ({ and }) 
    #is what is executed by the forked process.  We will read from
    #that process through the CGI filehandle
    unless (open(CGI,'-|')) {
        #don't need the writing handle to the pipe anymore
        close(WR); 
        
        #make STDIN be the read handle of the pipe (so we can get 
        #POST information from Apache)
        open(STDIN,'<&RD');

        #Run in the directory of the file
        $r->chdir_file($filename);

        #Run the file
        exec($filename) || return SERVER_ERROR;
    }

    $ENV{GATEWAY_INTERFACE} = $old_gw;

    #be sure to close the CGI at the end.  However, we probably can just
    #do this within our code without having to specifically register a
    #cleanup
    $r->register_cleanup(sub { close(CGI) });

    #The RD filehandle was for the forked process, not us, so we can close it
    close(RD); 

    #Give the CGI any POST data it needs
    #FIXME - need to do a buffered read so that Apache processes don't get
    #        too big
    my $read_buffer;
    $r->read($read_buffer, $r->header_in('Content-Length'));
    print WR $read_buffer;

    #we've printed everything we are going to, so we no longer need th WR
    #filehandle
    close(WR);
    
    #This reads the header data from the CGI script.  It has to handle
    #both CRLF linebreaks and plain LF linebreaks (because Apache handles
    #both)
    {
        #Set linebreak to LF only
        local $/ = "\n";

        #get the first line
        $_ =  <CGI>;

        #Check for CRLF linebreaks
        $/ = "\r\n" if (substr($_, -2, 1) eq "\r");

        #Check to see if the CGI printed a status line.  If so, use it
        if (m!^HTTP/\S+\s+((\d+).*)!o) {

            unless ($is_sub) {
                $r->status_line($1);
                $r->status($2);
            }

            #Read in the next line for processing
            $_ = <CGI>;
        }
        do {
            #Quit if this is the end of the headers
            last if ($_ eq $/);

            #remove the LF or CRLF
            chomp;

            #Give apache the header line
            unless ($is_sub) {
                $r->cgi_header_out(split(/:\s*/,$_,2));
            }
        } while (<CGI>); 
    }

    #Return the file handle for further processing
    return \*CGI;
}


#PURPOSE:  This subroutine handles all the subrequests for included pages
sub include {
    my $r = shift;
    my(
       $uri,              #URI of the request
       $no_run_request    #set to TRUE if the request is to only be looked 
                          #up and not run
    ) = @_;

    #get the subrequest object from apache
    my $subr = $r->lookup_uri($uri);

    # Set a note to keep the template from running on the subrequest
    # (this is a feeble hack to attempt to get rid of the infinite
    # template bug).
    $subr->notes->{'NoTemplate'} = 1;

    #The subrequest doesn't get any POST data
    $subr->header_in('Content-Length' => undef);
    
    #Run the subrequest unless $no_run_request is set to TRUE
    if ($no_run_request) {
        return $subr;
    } else {
        return $subr->run();
    }
}

#PURPOSE:  This prints the html header.  This doesn't really need to
#          be a function, and should probably just be included in the main
#          code
sub print_head_html {
    my $r = shift;

    $r->print(<<EOF);
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
EOF
}

1;

__END__


=head1 NAME

Template.pm - Module that implements the wolfram templating system

=head1 AUTHOR

Daryn Sharp - modified extensively by Jonathan Bartlett, Chris Knight, and Max Campos

=head1 SYNOPSIS

In your httpd.conf file

  PerlRequire /www/perl/libs/WRI/Template.pm
  PerlFixupHandler WRI::Template

This should be the last PerlFixupHandler in the chain

In your .htaccess file

  PerlSetVar AutoIncludes on
  PerlSetVar IncludeHeader /includes/header.cgi
  PerlSetVar IncludeFooter /includes/footer.cgi
  PerlSetVar handle:server-parsed yes
  PerlSetVar handle:cgi-script yes

And, if you don't want your page to be cached, you can say

  PerlSetVar NoCache 1

This module listens to the environment variables

=over 4

=item *

WRI_IS_ROBOT

=back

This module handles the following tasks:

=over 4

=item *

Attaches headers and footers to both CGI scripts and static or server-parsed html pages

=item *

Reformats the page to be robot-friendly if the WRI_IS_ROBOT subprocess_env variable is 
set (NOTE - requires that another module set this variable) by moving headers and 
footers to the end, and getting rid of images (leaves the alt tags), extra formatting 
tags, and extra spaces. 

=back

=head1 Implementation

=head2 Fixup Stage

This module is both a content handler and a fixup handler.  However, it is only 
installed as a fixup handler.  In the fixup stage, it checks to see:

=over 4
=item * 

Is it a main request (you don't EVER templatize subrequests, but you do handle 
non-initial requests, like from a server redirect)

=item * 

Are AutoIncludes turned on?

=item *

Is this module registered to handle the given content?

=back

If all of these check out, the template module registers itself as a content handler.  
It picks the function to register for this task based on whether or not the request is 
for a script(by checking $r->handler).

=head2 Handling Regular Pages (handle_req)

First, we issue subrequests (without running them) to find out the last modified time, 
and then issue the last_modified tag of the latest time.  It then reads the files 
-head file, and prints the HTML header with the contents of -head.  If WRI_IS_ROBOT is 
set, it then prints the main body HTML after running it through the roboticize() 
function to clean it up a bit.  It then runs the subrequest for the header and the 
footer.  Then it prints a final end html tag and returns.

=head2 Handling CGI Scripts (handle_cgi)

First, this gets the filename of the script, and it sets the header in $r and returns 
a filehandle to the CGI using the run_cgi() function.  It then checks to see if it 
should add the included files based on what the result code was (if it was a 500 
internal server error, we don't want to run the includes).  Also, we don't add the 
includes if the content type isn't text/html.

We then send the http header, and finish if its a HEAD request.  Then we read in the 
CGI and parse out the head and body tags, replacing them with what we want (we use the 
CGI's <title>, though).  If WRI_IS_ROBOT is set, we just print a body tag, and save 
the included header for later.  Otherwise, we print the header here.  Then we read all 
of the CGI output.  If WRI_IS_ROBOT is set, we roboticize it, otherwise we just print 
the data.  Then, we print the footer.

=head1 Supporting Functions

=head2 run_cgi

This opens a filehandle to a CGI script, parses out the headers, and sets all of the 
request values based on the headers.  It then returns the CGI filehandle.

=head2 roboticize

This handles all of our robot-friendly conversions, including replacing images with 
just the alt text, and other similar replacements.

package WRI::Negotiation;

use strict;
use Apache::Constants qw(:common :methods :http);

my $DEBUG = 0;
my %charsets = ('ja' => 'Shift_JIS');

sub handler
{
    my $r = shift;
    $DEBUG = $r->dir_config('NegoDebug');

    # we need to negotiate subrequests, except for when it's one of our own
    # sub-requests (denoted by markingof NegoDisable).
    $r->log_error("Negotiating: " . $r->uri)
        if $DEBUG;

    return DECLINED
        if (!$r->is_main && $r->main->notes->{'NegoDisable'});

    # WRI_PREFERRED_VARIANTS is always set regardless of whether we negotiate
    # the request or not.
    my $preferred = GetPreferredVariants($r);
    $r->subprocess_env('WRI_PREFERRED_VARIANTS' => 
                       join(' ', @$preferred));

    $r->log_error("Preferred: " . join(' ', @$preferred))
        if $DEBUG;

    if ($r->dir_config('NegoDisable'))
    {
        $r->log_error("Declining due to NegoDisable directive")
            if $DEBUG;
        return DECLINED;
    }

    # if the file exists (and is a file), then here are the possibilities:
    # 1) It's as a result of this module's internal rediretion
    #    (in which we want to read in the variables set by the last
    #     request's negotiation)
    # 2) It's as a result of this module's internal subrequests (used to
    #    determine available variants). (should not negotiate, just return)
    # 3) It was a straight out request that matched a file directly
    #    and the file was one that would have been negotiated
    #    (ie. GET /index.ja.html) 
    #    (set PREFERRED and AVAILABLE and VARIANT but do not
    #    actually negotiate).
    # 4) It was a straight out request that matched a file directly
    #    and the file was one that would NOT have been negotiated
    #    (ie. GET /blah/index.html where index.html is a file in the 
    #    directory) (set PREFERRED and return).
    # 5) Request is a directory.  Just set PREFERRED and return.

    if (-e $r->finfo())
    {
        if (-f $r->finfo())
        { 
            $r->log_error("Static request.")
                if $DEBUG;
            my $prev;
            if (($prev = $r->prev) && 
                $prev->notes->{'NegoRedirect'})
            {
                # If this is case 1, restore the variables
                # do a little environment cleanup...
                $r->log_error("Restoring variables from previous request.")
                    if $DEBUG;
                delete $prev->notes->{'NegoRedirect'};

                # now retrieve our variables from the previous request.
                my $var = $prev->subprocess_env('WRI_VARIANT');
                $r->subprocess_env('WRI_VARIANT' => $var);
                $r->subprocess_env('WRI_AVAILABLE_VARIANTS' =>
                                   $prev->subprocess_env('WRI_AVAILABLE_VARIANTS'));
                $r->header_out('Content-Language', $var);
                $r->header_out('Vary', 'accept-language');
                $r->header_out('Content-Location', $r->uri());
                $r->subprocess_env('WRI_CHARSET' => $charsets{$var})
                    if $charsets{$var};
            }
            elsif (my ($var) = 
                   substr($r->filename, rindex($r->filename, '/') + 1) 
                   =~ m!\.(\w+)\.!)
            {
                $r->log_error("case 3 for $var")
                 if $DEBUG;
                # else if the filename appears to be something.aa.ext
                # it's case 3, and we'll need to set the AVAILABLE_VARIANTS
                # variable, as well as VARIANT
                $r->subprocess_env('WRI_VARIANT' => $var);
                my $avail = GetAvailableVariants($r, $var);
                $r->subprocess_env('WRI_AVAILABLE_VARIANTS'
                                   => $avail->{str});    
                $r->header_out('Content-Language', $var);
                $r->subprocess_env('WRI_CHARSET' => $charsets{$var})
                    if $charsets{$var};
            }
            # else it's case 4 and we're done.
            $r->log_error("case 4 for $var")
               if $DEBUG;
            return OK;
        }
        else
        {
            # the file exists, but is not a regular file.  It's probably
            # a directory (case 5).  Don't bother!
            $r->log_error("Non-file request.  Aborting.")
                if $DEBUG;
            return OK;
        }
    }
    else
    {
        # file does not exist, it's possible that this is some weird virtual
        # filename, or it truely needs negotiation.
        
        # if it's a request for something.ja.ext and it doesn't exist,
        # do not negotiate.  This indicates a direct request of a
        # specific variant, but is a 404.
        return DECLINED
            if substr($r->filename, 
                      rindex($r->filename, '/') + 1) 
                =~ m!\.(\w+)\.!;

        # return declined if it's not in the form of filename.ext
        # (we do not negotiate filenames w/o extensions).
        return DECLINED
            if substr($r->filename, rindex($r->filename, '/')+1)
                !~ /.+\..+/;

        my $available = GetAvailableVariants($r);

        unless ($available->{error})
        {
            # it truely is a negotiated document.  Pick out the most
            # preferred variant.
            while (my $v = shift @$preferred)
            {
              $r->log_error("Is $v available?")
                  if $DEBUG;
                if ($available->{var}{$v})
                {
                    $r->log_error("Using variant $v, preparing redirect.")
                        if $DEBUG;
                    $r->subprocess_env('WRI_VARIANT' => $v);
                    $r->subprocess_env('WRI_AVAILABLE_VARIANTS' 
                                       => $available->{str});
                    
                    # we cannot redirect here because internal_redirect
                    # can only be called in the response phase.
                    $r->notes->{'NegoRedirectTo'} = $available->{var}{$v}{uri};
                    $r->set_handlers('PerlHandler' => [\&redirect_handler]);
                    $r->handler('perl-script'); 
                    
                    # similarly, some things (namely mod_dir) need to
                    # have an actual real filename set in order to know
                    # that a subrequest was successful.  
                    $r->filename($available->{path} . $available->{var}{$v}{file});
                    return OK;
                }
              $r->log_error("$v not available, get next variant")
                if $DEBUG;
            }
            
            # eventually, this should return some sort of page
            # that lets the user chose between the available
            # variants, for now, we'll just return declined
            return OK;
        }
        else
        {
            $r->log_error($available->{error});
            return OK;
        }
        return OK;
    }
}

sub redirect_handler
{
    # we're in the response phase, it's probably because the current
    # url was negotiated and needs an internal_redirect that can happen
    # only in this phase.
    my $r = shift;

    if (my $to = $r->notes->{'NegoRedirectTo'})
    {
        $r->log_error("Redirecting...")
            if $DEBUG;
        # remove the temp environment variable
        delete $r->notes->{'NegoRedirectTo'};

        # set a new temp environment variable to get the module
        # to grab the variables from the previous request.
        $r->notes->{'NegoRedirect'} = 1;
        $r->internal_redirect($to);
        return DONE;
    }
    else
    {
        return DECLINED;
    }
}

sub GetAvailableVariants
{
    # $myvar is only passed when the uri/filename already include the 
    # variant $myvar.  This is usually not the case.   Normally
    # $myvar is undef.

    my ($r, $myvar) = @_;
    my $filename = $r->filename;

    # break the filname into path and file pieces.
    my %available;
    $available{path} = substr($filename, 0, rindex($filename, '/') + 1);
    my $file = substr($filename, length($available{path}));
    my $uri = $r->uri;
    $available{var} = {};

    # if this is a lingual request, the we need to rewrite it so that
    # it appears to not be. 
    if ($myvar)
    {
        $available{var}{$myvar}{uri} = $uri;
        $available{var}{$myvar}{file} = $file;
        $available{str} = "$myvar=$file";
        $file =~ s!\.$myvar\.!\.!;
        $uri =~ s!\.$myvar\.!\.!;
    }

    $available{file} = $file;

    # Set this variable to keep the subrequest from performing
    # negotiation (negotiation is not needed for these lookup_uris
    # and could cause infinite loops).

    # cycle through all of our available variants
    # and figure out which ones exist.
    my $newfile;
    my $newuri;
    my $v;
    foreach $v (split ' ', $r->dir_config('NegoAvailableVariants'))
    {
        # we don't need to check to see if the current request
        # exists.
        next 
            if $v eq $myvar;

        $newfile = $file;
        $newuri = $uri;

        unless ($newfile =~ s!(.+\.)(.+)$!$1$v.$2! 
                && $newuri =~ s!(.*/)($file)!$1$newfile!)
        {
            $available{error} = "Could not look for variants for '$uri'.";
            return \%available;
        }

        $r->notes->{'NegoDisable'} = 1;
        my $subr = $r->lookup_uri($newuri);
        delete $r->notes->{'NegoDisable'};

        # this is the same test that mod_dir does to see if a URL
        # succeeded or not.
        if ($subr->status == HTTP_OK && (-e $subr->finfo()))
        {
            $r->log_error("Found variant $v")
                if $DEBUG;
            $available{var}{$v}{file} = $newfile;
            $available{var}{$v}{uri} = $newuri;
            $available{str} .= ($available{str} ? ' ' : '') . "$v=$newfile";
        }
    }
    return \%available;
}

sub GetPreferredVariants
{
    my $r = shift;
    # now let's figure out what the order of preference is.
    # and select our return variant at the same time.
    my @preferred; # list of variants in preferential order
    
    my $v;
    foreach $v (split ' ', $r->dir_config('NegoMatchOrder'))
    {
        if ($v eq 'Browser')
        {
            # browser specified language preferences
            push @preferred, split(',', $r->headers_in->{'Accept-language'});
        }
        elsif ($v =~ /Host\((.+)\)/)
        {
            # if it's a Host(something.com=en) block.
            # NegoHostMatch preferences.  NegoHostMatch
            # has the following syntax:
            # NegoHostMatch www.wolfram.co.uk=en www.wolfram.co.jp=ja 
www.wolfram.com=en
            my $host = $r->headers_in->{'Host'};
            my ($hvar) = $v =~ /Host\($host=([\w-]+)\)/;
            push @preferred, $hvar
                if ($hvar);
        }
        else
        {
            push @preferred, $v
                if $v;
        }
    }
    return \@preferred;
}

1;

__END__


=head1 NAME

Negotiation.pm - Module to handle lingual content negotiation reasonably

=head1 AUTHOR

Max Campos

=head1 SYNOPSIS

In your .htaccess file ...

  PerlRequire /www/perl/libs/WRI/Negotiation.pm
  PerlFixupHandler WRI::Negotiation 
  PerlSetVar NegoAvailableVariants "en ja"
  PerlSetVar NegoMatchOrder "Host(www.wolfram.co.jp=ja) Browser en ja"

=head1 WHAT IT DOES

=head2 Definitions

"lingual document" - A document that has a language extension. (such
as: whatever.ja.html, foo.en.cgi, blech.pr.php3).

"negotiated request" - An incoming request for non-lingual document
in which the non-lingual document does not exist but lingual version(s) do.
(ie. a request for something.html, and something.en.html,
something.ja.html exist but something.html does not)

"lingual document request" - an incoming request for a lingual
document. (ie. an incoming request for foo.ja.html).

=head2 Configuration Directives

=over 4

=item NegoDisable 1

Disable negotiation for this directory and all directories below it.

=item NegoAvailableVariants ("en ja")

list of variants to check for when determining what files are
available/etc.

=item NegoMatchOrder ("Browser HostMatch Default")

The order in which to select variants (as well as set the
WRI_PREFERRED_VARIANTS variable).  It's a space separated list with each
item being one of three things:

=over 4

=item Browser 

Adds browser provided Accept-Language variants to the
preference list.

=item Host(hostname.something.com=vr)

Adds variant vr to the preference
list if the client present's a Host: header that matches
hostname.something.com.

=item ja

Anything that does not match the two above specifications is added
to the variant list as is.

=back

=back

=head2 Environment Variables

=over 4

=item WRI_AVAILABLE_VARIANTS 

ja=index.ja.cgi en=index.en.cgi

This is a list of all of the available variants for the current file,
and their filenames.

Presented To: Negotiated requests, lingual document requests

=item WRI_PREFERRED_VARIANTS 

kabuki en pr en ja

=over 4

=item *

This is a list of variants in preferential order as determined by the
NegoMatchOrder directive (this could be any combination of host based
settings, browser presented languages, as well as the default).

=item *

This list may include languages that we have never heard of. 

=item *

This list may include repeats.

=item *

Since this is not based on what is available, we can provide this to all
requests.

=back

Presented to: *all* requests. 

=item WRI_VARIANT

en

=over 4

=item * 

This is the language of the current request.

=item * 

Essentially, this is the first element of WRI_PREFERRED_VARIANTS that is
also listed in WRI_AVAILABLE_VARIANTS

=back

Presented To: Negotiated requests, lingual document requests

=back

=head1 REQUEST PROCESSING

=head2 Process

I<* set WRI_PREFERRED_VARIANTS>

- Does the file requested exist?

Yes?  Do not negotiate. (DONE)

No?  Continue...

- Is it a non-lingual request?  

Yes? Continue...

No? 404.

- Do variants exists?

Yes? Continue...

No? 404.

I<* set WRI_AVAILABLE_VARIANTS>

- Does an appropriate variant for the file exist?

Yes?  set WRI_VARIANT and return it

No?   404.


=head2 Example Scenarios

-> Request for index.html, where only index.html, index.ja.html, 
index.en.html exist.  

<- index.html is returned.

B<------------------------------>

-> Request for index.html, where only index.ja.html index.en.html
exist.

<- Most appropriate document is returned (or 404 if no appropriate

B<------------------------------>

-> Request for index.ja.html where index.html index.ja.html index.en.html
exist.
<- index.ja.html is returned.

-> Request for index.ja.html where index.html index.en.html exist.
<- 404.

=cut

Repost with code: Newbie mysterious 500 error with PerlFixupHandlers

Reply via email to