Sorry I forgot to include these files with the earlier message. Does anyone have any clues with this problem. I have searched the archives for internal server errors and PerlFixupHandlers and found no similar problems. Perhaps this is a mod_dir issue. Judy judy selen wrote: > > We are using > RedHat Linux apache 1.3.14 mod_perl_1.23_01 > > Our site is set up to generate pages using a templating system to add > custom headers footers and sidebars. As well we use .htaccess files to > further customize the subprocess_env. > > We are using custom modules installed as PerlFixupHandlers both for > template generation and multilingual negotiation > > Negotiation.pm handles lingual content negotiation. It determines > what version of the page to present based on the users preferred > variants and the available variants for that page.When the best match > is found, it > performs an internal redirect to that document. > > Template.pm is installed as a fixup handler, and is controlled through > various PerlSetVar directives. > Basically, it determines what to include at the beginning and end of > the web page. > fixup sets the content handler for the request. > Also has two subroutines, handle_rec for static pages ,and handle_cgi > for scripts These allow the auto-inclusion of headers and footers of web > pages. > > In httpd.conf > -------------- > DirectoryIndex index.cgi index.html > > PerlRequire /www/perl/libs/WRI/Template.pm > PerlRequire /www/perl/libs/WRI/Negotiation.pm > > PerlFixupHandler WRI::Negotiation WRI::Template::fixup > --------------------------------------------------- > > These modules work 90% of the time.... > > The problem occurs when a directory is requested, and served as a > negotiated document > and is passed to our content handler. (handle_cgi) > A save-as window pops up in the browser and Apache logs a 500 > error. (The file it is trying to save is the standard 500 internal > error message.) > In the Template::fixup handler, the > document is being detected as a cgi-script, not a server-parsed > document. The 500 occurs because we are trying to execute a static > document or directory. > > Template.pm > In run_cgi: > > unless (open(CGI,'-|')) { > #don't need the writing handle to the pipe anymore > close(WR); > > #make STDIN be the read handle of the pipe (so we can get > #POST information from Apache) > open(STDIN,'<&RD'); > > #Run in the directory of the file > $r->chdir_file($filename); > > ### This returns 500 when a directory or static file is negotiated ### > > #Run the file > exec($filename) || return SERVER_ERROR; > } > > > Error Log: > > [Wed Jan 31 13:04:52 2001] [error] Can't use string ("500") as a symbol > ref while "strict refs" in use at /www/perl/libs/WRI/Template.pm line > 302. > > [Wed Jan 31 13:04:55 2001] [error] Usage: Apache::cgi_header_out(r, > key, > ...) at /www/perl/libs/WRI/Template.pm line 433. > > These lines numbers may not be accurate since we have added and > deleted > debugging code.. > > > Template.pm - handle_cgi: > > #FIXME - needs error checking. If run CGI fails, $cgi will have > # the number 500 instead of a file handle, and will fail > later on > > my $cgi = run_cgi($r, $r->filename(), $r->uri()); > > line 302: > while (<$cgi>) { > > #remove the trailing body and html tags from the CGI output > #if we are using includes > if ($addincs) { > > s!</(body|html)[^>]*>!!gio; > } > > #print out the line that we just read > $r->print($_); > } > > . > . > . > > do { > #Quit if this is the end of the headers > last if ($_ eq $/); > > #remove the LF or CRLF > chomp; > > #Give apache the header line > unless ($is_sub) { > line 433: $r->cgi_header_out(split(/:\s*/,$_,2)); > } > } while (<CGI>); > > > --------------------------------------------- > > My understanding is that when OK or DECLINED is returned from these > modules in the list of fixup handlers, control passes to the next, > and when DONE is returned, the remaining are > skipped mod_dir regains control and the next default file is requested > (index.cgi index.html) > > Is this correct or is there something I am misunderstanding. > > Request Outline > --------------- > mod_dir sends directory request to Negotiation.pm > > /products/student/calcwiz/ > returns OK from Nego.pm > mod_dir then sends index.cgi - index.html > Nego.pm negotiates index.cgi and each of available variations > then index.html ...... > > when found most appropriate > Negotiating: > /products/student/calcwiz/index.en.html > > Template.pm > handling request.. > /products/student/calcwiz/index.en.html > > Another Clue > This error recurs when the page is reloaded 25 - 30 times > MaxRequestsPerChild is 30 > Could this be related to our problem. > > I appreciate any clues anyone may have to why this problem occurs. > Thanks in advance, > Judy
package WRI::Template; use strict; use Apache::Constants qw(:common :methods); use Apache::Log; use Apache::File; #PURPOSE: This package allows the auto-inclusion of headers and footers # of web pages. It is installed as a fixup handler, and is # controlled through various PerlSetVar directives # #CONFIGURATION: This module uses several PerlSetVar directives for # configuration. They are: # PerlSetVar AutoIncludes on # PerlSetVar IncludeHeader filename # PerlSetVar IncludeFooter filename # PerlSetVar handle:handlername # PerlSetVar NoCache 1 # # These directives are described in the POD document at # the end of this file # # Also, a -head file is used during the inclusion process # #BASIC OUTLINE: Normally, you would think that this type of module # would be installed as a Content Handler. However, # then it would have to handle all the requests _and_ # subrequests made (for the header, body, and footer). # As a fixup handler, it can simply check to see if it # is the main request and, if so, execute, otherwise # it will pass control on to the modules that normally # handle such requests. # # If it detects that this is the main request, and that # includes are turned on. Then it checks # what the content handler was guessed to be. It then # checks the PerlSetVar handle: directives, to see if # it is supposed to handle requests of this type. # If so, it loads its own content handler instead of the # one Apache thought it should use. It loads a different # handler for CGI scripts than for static pages. # #FIXME - maybe whether or not includes are used should be based on the # content type and not necessarily the handler. Just a thought # my $DEBUG = 0; my $handlers = {'cgi-script' => \&handle_cgi, 'server-parsed' => \&handle_req, '*' => \&handle_cgi}; sub fixup { my $r = shift; #check to see that includes are turned on, this is the main request, #and we are supposed to handle requests of this type unless ($r->is_main() && lc($r->subprocess_env('AutoIncludes')) eq 'on' && $r->subprocess_env('handle:'.$r->handler()) && !($r->notes->{'NoTemplate'})) { return DECLINED; } #FIXME - I don't think this is used anywhere anymore. This is # now accomplished by the EnvFixup module. This is used # to tell what the main file is being generated. $r->subprocess_env('MAIN_SCRIPT_FILENAME', $r->filename()); #Set the content handler to be this handler. We have two handlers, #a CGI one and one for everything else. CGI's have to be handled #differently because they can have different titles and meta tags #depending on how they are executed, while HTML pages can simply have #their titles read directly from a file #Push handlers causes lots of problems in other things (most notably #Apache::Registry scripts). That's why we use set_handlers. # This stuff below allows us to use a 'handle-as:support-faq server-parsed' # sort of thing (means handle the 'support-faq' handler as the equivalent to # what the template module would do with server-parsed content. my $k; foreach $k (keys %{$r->subprocess_env}) { my $handler; next unless ($handler) = $k =~ /^handle-as:(.+)/; $handlers->{$handler} = $handlers->{$r->subprocess_env($k)}; } $r->set_handlers(PerlHandler => [$handlers->{$r->handler()} || $handlers->{'*'}]); $r->handler('perl-script'); #delete the environment at the end of the request (this may happen #anyway) $r->register_cleanup(sub { undef(%ENV); }); return OK; } #PURPOSE: This subroutine handles the template module for static pages # sub handle_req { my $r = shift; my $log = $r->log(); #Make sure the file exists. If not, log an error, and return NOT FOUND if (! (-e $r->filename())) { $log->error("Url does not exist: ".$r->uri()); return NOT_FOUND; } #FIXME - #We need to disallow appending any PATH_INFO to regular #requests because it messes up URLs a lot. However, sometimes #valid URLs ending in a "/" get that slash added to #$r->path_info() (not sure why). However, we need to do something #(maybe redirect?) to keep people from putting in URLs with #extra slashes, because it messes LOTS of things up. #get subrequests without running them, and update the timestamp #in case it's a HEAD request my $subr_header = include( $r, $r->subprocess_env('IncludeHeader'), q(NO_RUN) ); $r->update_mtime((stat $subr_header->filename())[9]); my $subr_footer = include( $r, $r->subprocess_env('IncludeFooter'), q(NO_RUN) ); $r->update_mtime((stat $subr_footer->filename())[9]); #update the timestamp with the current file name $r->update_mtime((stat $r->filename())[9]); #Set last modified and cacheing(ETAG) information unless NoCache is set. #NoCache is used for our pages with Server Side Includes that change a lot unless ($r->subprocess_env('NoCache')) { $r->set_last_modified(); $r->set_etag(); } #Not sure what this does... Chris added it. I think it has something #to do with the HTTP/1.1 protocol, but not sure (like, for a #get-if-not-modified-since or whatever that is) if ((my $rc = $r->meets_conditions()) != OK) { return $rc; } #send the header my $charset = $r->subprocess_env('WRI_CHARSET'); $r->send_http_header('text/html' . ($charset ? "; charset=$charset" : '')); #we're done if they only asked for the header return OK if ($r->header_only()); #Print the page introduction print_head_html($r); #Read all data that goes between the <head></head> tags from #the -head file my $head = join('-', $r->filename(), 'head'); if (open(FH, $head)) { $r->send_fd(\*FH); close(FH); } else { $r->print("<title>Wolfram Research, Inc.</title>\n"); } #end the header unless configuration requests otherwise $r->print("\n</head>\n") unless $r->subprocess_env('KeepHeadOpen'); #run the header $subr_header->run(); #run the body include($r, $r->uri()); #run the footer $subr_footer->run(); #FIXME - should this be here or in the templates? Probably here # because we tack on the beginning <html> tag $r->print("\n</html>\n"); return OK; } #PURPOSE: This handles execution of CGI scripts under the template module. # More work may have to be done to get this to work in C because # mod_perl handles a lot of the environment handling tasks for # us here. sub handle_cgi { my $r = shift; my $buf; #buffer for reading in HTML header my $log = $r->log(); #Return NOT FOUND if the CGI script does not exist unless (-e $r->filename()) { $log->error("Url does not exist: ".$r->uri()); return NOT_FOUND; } #Get the CGI file handle. The run_cgi procedure does a lot of the #grunt work for handling requests (forking, sending headers, #transferring POST data, etc) # #FIXME - needs error checking. If run CGI fails, $cgi will have # the number 500 instead of a file handle, and will fail later on my $cgi = run_cgi($r, $r->filename(), $r->uri()); #only add includes # if the CGI execution is successful # and the CGI is spitting out HTML my $addincs = (($r->status >= 200) && ($r->status < 300) && ($r->content_type() eq 'text/html')); # Browsers suck because they don't follow the spec # and so setting Expires: $x Date: $x does not work because # the browser does not listen to Date: when deciding when Expires: # occurs (cknite) # UPDATE - Actually, I think it is only supposed to work if the browser # identifies itself as HTTP/1.1. Maybe we could re-enable this # for HTTP/1.1 browsers (johnnyb) # $r->no_cache(1); #send the header $r->send_http_header(); #we're done if they only want the header return OK if ($r->header_only()); #If we are using includes, print out the beginning HTML and the <head> #information from the CGI, and include the header if ($addincs) { $r->print("<html>\n<head>\n"); #look through the CGI output, and print out everything between #<head> and </head>. This will probably be the most difficult thing #to do in C. Also, strip out the <html>, <head>, </head>, and #<body> tags as well. These can occur anywhere #within a line. The <body> will be the hardest because the tag #itself can spread multiple lines. while (<$cgi>) { s!<html[^>]*>!!gio; s!</?head[^>]*>!!gio; my $body = s!<body.*?>(.*)!!io; $buf .= $_; if ($body) { $r->print($buf); $buf = $1; last; } } #Close the <head> tag unless otherwise specified by the configuration $r->print("\n</head>\n") unless $r->subprocess_env('KeepHeadOpen'); include($r, $r->subprocess_env('IncludeHeader')); #print the rest of the buffer read from the CGI (we do it now, #because it should only come after the includes. This will print #everything after the body tag that lies on the same or following #line of the body tag $r->print($buf) if ($buf); } #Print out the contents of the CGI while (<$cgi>) { #remove the trailing body and html tags from the CGI output #if we are using includes if ($addincs) { s!</(body|html)[^>]*>!!gio; } #print out the line that we just read $r->print($_); } #If we are using includes, put in the footer, and print the ending #</html> if ($addincs) { include($r, $r->subprocess_env('IncludeFooter')); $r->print("</html>\n"); } #We're done! return OK; } #PURPOSE: Opens a filehandle to a CGI script, also parses the CGI headers # into the request object # #NOTE: In C this will have to use many functions from util_script.c # in the apache source distribution # sub run_cgi { my $r = shift; my( $filename, #name of the executable $uri, #URI requested by the browser $is_sub #not sure what this is for. It is always sent as FALSE ) = @_; local($|); $| = 1; #Fix mod_perl's junk (mod_perl inappropriately sets the #GATEWAY_INTERFACE on non-Apache::Registry scripts. my $old_gw = $ENV{GATEWAY_INTERFACE}; $ENV{GATEWAY_INTERFACE} = 'CGI/1.1'; #Open up a pipe. This will be used to give input to the CGI pipe(RD,WR) || return SERVER_ERROR; #This causes a fork, with the standard output of the forked process #going to the CGI handle. The code withing the braces ({ and }) #is what is executed by the forked process. We will read from #that process through the CGI filehandle unless (open(CGI,'-|')) { #don't need the writing handle to the pipe anymore close(WR); #make STDIN be the read handle of the pipe (so we can get #POST information from Apache) open(STDIN,'<&RD'); #Run in the directory of the file $r->chdir_file($filename); #Run the file exec($filename) || return SERVER_ERROR; } $ENV{GATEWAY_INTERFACE} = $old_gw; #be sure to close the CGI at the end. However, we probably can just #do this within our code without having to specifically register a #cleanup $r->register_cleanup(sub { close(CGI) }); #The RD filehandle was for the forked process, not us, so we can close it close(RD); #Give the CGI any POST data it needs #FIXME - need to do a buffered read so that Apache processes don't get # too big my $read_buffer; $r->read($read_buffer, $r->header_in('Content-Length')); print WR $read_buffer; #we've printed everything we are going to, so we no longer need th WR #filehandle close(WR); #This reads the header data from the CGI script. It has to handle #both CRLF linebreaks and plain LF linebreaks (because Apache handles #both) { #Set linebreak to LF only local $/ = "\n"; #get the first line $_ = <CGI>; #Check for CRLF linebreaks $/ = "\r\n" if (substr($_, -2, 1) eq "\r"); #Check to see if the CGI printed a status line. If so, use it if (m!^HTTP/\S+\s+((\d+).*)!o) { unless ($is_sub) { $r->status_line($1); $r->status($2); } #Read in the next line for processing $_ = <CGI>; } do { #Quit if this is the end of the headers last if ($_ eq $/); #remove the LF or CRLF chomp; #Give apache the header line unless ($is_sub) { $r->cgi_header_out(split(/:\s*/,$_,2)); } } while (<CGI>); } #Return the file handle for further processing return \*CGI; } #PURPOSE: This subroutine handles all the subrequests for included pages sub include { my $r = shift; my( $uri, #URI of the request $no_run_request #set to TRUE if the request is to only be looked #up and not run ) = @_; #get the subrequest object from apache my $subr = $r->lookup_uri($uri); # Set a note to keep the template from running on the subrequest # (this is a feeble hack to attempt to get rid of the infinite # template bug). $subr->notes->{'NoTemplate'} = 1; #The subrequest doesn't get any POST data $subr->header_in('Content-Length' => undef); #Run the subrequest unless $no_run_request is set to TRUE if ($no_run_request) { return $subr; } else { return $subr->run(); } } #PURPOSE: This prints the html header. This doesn't really need to # be a function, and should probably just be included in the main # code sub print_head_html { my $r = shift; $r->print(<<EOF); <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head> EOF } 1; __END__ =head1 NAME Template.pm - Module that implements the wolfram templating system =head1 AUTHOR Daryn Sharp - modified extensively by Jonathan Bartlett, Chris Knight, and Max Campos =head1 SYNOPSIS In your httpd.conf file PerlRequire /www/perl/libs/WRI/Template.pm PerlFixupHandler WRI::Template This should be the last PerlFixupHandler in the chain In your .htaccess file PerlSetVar AutoIncludes on PerlSetVar IncludeHeader /includes/header.cgi PerlSetVar IncludeFooter /includes/footer.cgi PerlSetVar handle:server-parsed yes PerlSetVar handle:cgi-script yes And, if you don't want your page to be cached, you can say PerlSetVar NoCache 1 This module listens to the environment variables =over 4 =item * WRI_IS_ROBOT =back This module handles the following tasks: =over 4 =item * Attaches headers and footers to both CGI scripts and static or server-parsed html pages =item * Reformats the page to be robot-friendly if the WRI_IS_ROBOT subprocess_env variable is set (NOTE - requires that another module set this variable) by moving headers and footers to the end, and getting rid of images (leaves the alt tags), extra formatting tags, and extra spaces. =back =head1 Implementation =head2 Fixup Stage This module is both a content handler and a fixup handler. However, it is only installed as a fixup handler. In the fixup stage, it checks to see: =over 4 =item * Is it a main request (you don't EVER templatize subrequests, but you do handle non-initial requests, like from a server redirect) =item * Are AutoIncludes turned on? =item * Is this module registered to handle the given content? =back If all of these check out, the template module registers itself as a content handler. It picks the function to register for this task based on whether or not the request is for a script(by checking $r->handler). =head2 Handling Regular Pages (handle_req) First, we issue subrequests (without running them) to find out the last modified time, and then issue the last_modified tag of the latest time. It then reads the files -head file, and prints the HTML header with the contents of -head. If WRI_IS_ROBOT is set, it then prints the main body HTML after running it through the roboticize() function to clean it up a bit. It then runs the subrequest for the header and the footer. Then it prints a final end html tag and returns. =head2 Handling CGI Scripts (handle_cgi) First, this gets the filename of the script, and it sets the header in $r and returns a filehandle to the CGI using the run_cgi() function. It then checks to see if it should add the included files based on what the result code was (if it was a 500 internal server error, we don't want to run the includes). Also, we don't add the includes if the content type isn't text/html. We then send the http header, and finish if its a HEAD request. Then we read in the CGI and parse out the head and body tags, replacing them with what we want (we use the CGI's <title>, though). If WRI_IS_ROBOT is set, we just print a body tag, and save the included header for later. Otherwise, we print the header here. Then we read all of the CGI output. If WRI_IS_ROBOT is set, we roboticize it, otherwise we just print the data. Then, we print the footer. =head1 Supporting Functions =head2 run_cgi This opens a filehandle to a CGI script, parses out the headers, and sets all of the request values based on the headers. It then returns the CGI filehandle. =head2 roboticize This handles all of our robot-friendly conversions, including replacing images with just the alt text, and other similar replacements.
package WRI::Negotiation; use strict; use Apache::Constants qw(:common :methods :http); my $DEBUG = 0; my %charsets = ('ja' => 'Shift_JIS'); sub handler { my $r = shift; $DEBUG = $r->dir_config('NegoDebug'); # we need to negotiate subrequests, except for when it's one of our own # sub-requests (denoted by markingof NegoDisable). $r->log_error("Negotiating: " . $r->uri) if $DEBUG; return DECLINED if (!$r->is_main && $r->main->notes->{'NegoDisable'}); # WRI_PREFERRED_VARIANTS is always set regardless of whether we negotiate # the request or not. my $preferred = GetPreferredVariants($r); $r->subprocess_env('WRI_PREFERRED_VARIANTS' => join(' ', @$preferred)); $r->log_error("Preferred: " . join(' ', @$preferred)) if $DEBUG; if ($r->dir_config('NegoDisable')) { $r->log_error("Declining due to NegoDisable directive") if $DEBUG; return DECLINED; } # if the file exists (and is a file), then here are the possibilities: # 1) It's as a result of this module's internal rediretion # (in which we want to read in the variables set by the last # request's negotiation) # 2) It's as a result of this module's internal subrequests (used to # determine available variants). (should not negotiate, just return) # 3) It was a straight out request that matched a file directly # and the file was one that would have been negotiated # (ie. GET /index.ja.html) # (set PREFERRED and AVAILABLE and VARIANT but do not # actually negotiate). # 4) It was a straight out request that matched a file directly # and the file was one that would NOT have been negotiated # (ie. GET /blah/index.html where index.html is a file in the # directory) (set PREFERRED and return). # 5) Request is a directory. Just set PREFERRED and return. if (-e $r->finfo()) { if (-f $r->finfo()) { $r->log_error("Static request.") if $DEBUG; my $prev; if (($prev = $r->prev) && $prev->notes->{'NegoRedirect'}) { # If this is case 1, restore the variables # do a little environment cleanup... $r->log_error("Restoring variables from previous request.") if $DEBUG; delete $prev->notes->{'NegoRedirect'}; # now retrieve our variables from the previous request. my $var = $prev->subprocess_env('WRI_VARIANT'); $r->subprocess_env('WRI_VARIANT' => $var); $r->subprocess_env('WRI_AVAILABLE_VARIANTS' => $prev->subprocess_env('WRI_AVAILABLE_VARIANTS')); $r->header_out('Content-Language', $var); $r->header_out('Vary', 'accept-language'); $r->header_out('Content-Location', $r->uri()); $r->subprocess_env('WRI_CHARSET' => $charsets{$var}) if $charsets{$var}; } elsif (my ($var) = substr($r->filename, rindex($r->filename, '/') + 1) =~ m!\.(\w+)\.!) { $r->log_error("case 3 for $var") if $DEBUG; # else if the filename appears to be something.aa.ext # it's case 3, and we'll need to set the AVAILABLE_VARIANTS # variable, as well as VARIANT $r->subprocess_env('WRI_VARIANT' => $var); my $avail = GetAvailableVariants($r, $var); $r->subprocess_env('WRI_AVAILABLE_VARIANTS' => $avail->{str}); $r->header_out('Content-Language', $var); $r->subprocess_env('WRI_CHARSET' => $charsets{$var}) if $charsets{$var}; } # else it's case 4 and we're done. $r->log_error("case 4 for $var") if $DEBUG; return OK; } else { # the file exists, but is not a regular file. It's probably # a directory (case 5). Don't bother! $r->log_error("Non-file request. Aborting.") if $DEBUG; return OK; } } else { # file does not exist, it's possible that this is some weird virtual # filename, or it truely needs negotiation. # if it's a request for something.ja.ext and it doesn't exist, # do not negotiate. This indicates a direct request of a # specific variant, but is a 404. return DECLINED if substr($r->filename, rindex($r->filename, '/') + 1) =~ m!\.(\w+)\.!; # return declined if it's not in the form of filename.ext # (we do not negotiate filenames w/o extensions). return DECLINED if substr($r->filename, rindex($r->filename, '/')+1) !~ /.+\..+/; my $available = GetAvailableVariants($r); unless ($available->{error}) { # it truely is a negotiated document. Pick out the most # preferred variant. while (my $v = shift @$preferred) { $r->log_error("Is $v available?") if $DEBUG; if ($available->{var}{$v}) { $r->log_error("Using variant $v, preparing redirect.") if $DEBUG; $r->subprocess_env('WRI_VARIANT' => $v); $r->subprocess_env('WRI_AVAILABLE_VARIANTS' => $available->{str}); # we cannot redirect here because internal_redirect # can only be called in the response phase. $r->notes->{'NegoRedirectTo'} = $available->{var}{$v}{uri}; $r->set_handlers('PerlHandler' => [\&redirect_handler]); $r->handler('perl-script'); # similarly, some things (namely mod_dir) need to # have an actual real filename set in order to know # that a subrequest was successful. $r->filename($available->{path} . $available->{var}{$v}{file}); return OK; } $r->log_error("$v not available, get next variant") if $DEBUG; } # eventually, this should return some sort of page # that lets the user chose between the available # variants, for now, we'll just return declined return OK; } else { $r->log_error($available->{error}); return OK; } return OK; } } sub redirect_handler { # we're in the response phase, it's probably because the current # url was negotiated and needs an internal_redirect that can happen # only in this phase. my $r = shift; if (my $to = $r->notes->{'NegoRedirectTo'}) { $r->log_error("Redirecting...") if $DEBUG; # remove the temp environment variable delete $r->notes->{'NegoRedirectTo'}; # set a new temp environment variable to get the module # to grab the variables from the previous request. $r->notes->{'NegoRedirect'} = 1; $r->internal_redirect($to); return DONE; } else { return DECLINED; } } sub GetAvailableVariants { # $myvar is only passed when the uri/filename already include the # variant $myvar. This is usually not the case. Normally # $myvar is undef. my ($r, $myvar) = @_; my $filename = $r->filename; # break the filname into path and file pieces. my %available; $available{path} = substr($filename, 0, rindex($filename, '/') + 1); my $file = substr($filename, length($available{path})); my $uri = $r->uri; $available{var} = {}; # if this is a lingual request, the we need to rewrite it so that # it appears to not be. if ($myvar) { $available{var}{$myvar}{uri} = $uri; $available{var}{$myvar}{file} = $file; $available{str} = "$myvar=$file"; $file =~ s!\.$myvar\.!\.!; $uri =~ s!\.$myvar\.!\.!; } $available{file} = $file; # Set this variable to keep the subrequest from performing # negotiation (negotiation is not needed for these lookup_uris # and could cause infinite loops). # cycle through all of our available variants # and figure out which ones exist. my $newfile; my $newuri; my $v; foreach $v (split ' ', $r->dir_config('NegoAvailableVariants')) { # we don't need to check to see if the current request # exists. next if $v eq $myvar; $newfile = $file; $newuri = $uri; unless ($newfile =~ s!(.+\.)(.+)$!$1$v.$2! && $newuri =~ s!(.*/)($file)!$1$newfile!) { $available{error} = "Could not look for variants for '$uri'."; return \%available; } $r->notes->{'NegoDisable'} = 1; my $subr = $r->lookup_uri($newuri); delete $r->notes->{'NegoDisable'}; # this is the same test that mod_dir does to see if a URL # succeeded or not. if ($subr->status == HTTP_OK && (-e $subr->finfo())) { $r->log_error("Found variant $v") if $DEBUG; $available{var}{$v}{file} = $newfile; $available{var}{$v}{uri} = $newuri; $available{str} .= ($available{str} ? ' ' : '') . "$v=$newfile"; } } return \%available; } sub GetPreferredVariants { my $r = shift; # now let's figure out what the order of preference is. # and select our return variant at the same time. my @preferred; # list of variants in preferential order my $v; foreach $v (split ' ', $r->dir_config('NegoMatchOrder')) { if ($v eq 'Browser') { # browser specified language preferences push @preferred, split(',', $r->headers_in->{'Accept-language'}); } elsif ($v =~ /Host\((.+)\)/) { # if it's a Host(something.com=en) block. # NegoHostMatch preferences. NegoHostMatch # has the following syntax: # NegoHostMatch www.wolfram.co.uk=en www.wolfram.co.jp=ja www.wolfram.com=en my $host = $r->headers_in->{'Host'}; my ($hvar) = $v =~ /Host\($host=([\w-]+)\)/; push @preferred, $hvar if ($hvar); } else { push @preferred, $v if $v; } } return \@preferred; } 1; __END__ =head1 NAME Negotiation.pm - Module to handle lingual content negotiation reasonably =head1 AUTHOR Max Campos =head1 SYNOPSIS In your .htaccess file ... PerlRequire /www/perl/libs/WRI/Negotiation.pm PerlFixupHandler WRI::Negotiation PerlSetVar NegoAvailableVariants "en ja" PerlSetVar NegoMatchOrder "Host(www.wolfram.co.jp=ja) Browser en ja" =head1 WHAT IT DOES =head2 Definitions "lingual document" - A document that has a language extension. (such as: whatever.ja.html, foo.en.cgi, blech.pr.php3). "negotiated request" - An incoming request for non-lingual document in which the non-lingual document does not exist but lingual version(s) do. (ie. a request for something.html, and something.en.html, something.ja.html exist but something.html does not) "lingual document request" - an incoming request for a lingual document. (ie. an incoming request for foo.ja.html). =head2 Configuration Directives =over 4 =item NegoDisable 1 Disable negotiation for this directory and all directories below it. =item NegoAvailableVariants ("en ja") list of variants to check for when determining what files are available/etc. =item NegoMatchOrder ("Browser HostMatch Default") The order in which to select variants (as well as set the WRI_PREFERRED_VARIANTS variable). It's a space separated list with each item being one of three things: =over 4 =item Browser Adds browser provided Accept-Language variants to the preference list. =item Host(hostname.something.com=vr) Adds variant vr to the preference list if the client present's a Host: header that matches hostname.something.com. =item ja Anything that does not match the two above specifications is added to the variant list as is. =back =back =head2 Environment Variables =over 4 =item WRI_AVAILABLE_VARIANTS ja=index.ja.cgi en=index.en.cgi This is a list of all of the available variants for the current file, and their filenames. Presented To: Negotiated requests, lingual document requests =item WRI_PREFERRED_VARIANTS kabuki en pr en ja =over 4 =item * This is a list of variants in preferential order as determined by the NegoMatchOrder directive (this could be any combination of host based settings, browser presented languages, as well as the default). =item * This list may include languages that we have never heard of. =item * This list may include repeats. =item * Since this is not based on what is available, we can provide this to all requests. =back Presented to: *all* requests. =item WRI_VARIANT en =over 4 =item * This is the language of the current request. =item * Essentially, this is the first element of WRI_PREFERRED_VARIANTS that is also listed in WRI_AVAILABLE_VARIANTS =back Presented To: Negotiated requests, lingual document requests =back =head1 REQUEST PROCESSING =head2 Process I<* set WRI_PREFERRED_VARIANTS> - Does the file requested exist? Yes? Do not negotiate. (DONE) No? Continue... - Is it a non-lingual request? Yes? Continue... No? 404. - Do variants exists? Yes? Continue... No? 404. I<* set WRI_AVAILABLE_VARIANTS> - Does an appropriate variant for the file exist? Yes? set WRI_VARIANT and return it No? 404. =head2 Example Scenarios -> Request for index.html, where only index.html, index.ja.html, index.en.html exist. <- index.html is returned. B<------------------------------> -> Request for index.html, where only index.ja.html index.en.html exist. <- Most appropriate document is returned (or 404 if no appropriate B<------------------------------> -> Request for index.ja.html where index.html index.ja.html index.en.html exist. <- index.ja.html is returned. -> Request for index.ja.html where index.html index.en.html exist. <- 404. =cut