Re: UTF-8 encoding problems under Apache 2 with mod_perl 2.

Jeff Nokes Wed, 04 Apr 2007 20:34:01 -0700

Well,
We completely separate out our content, from our presentation templates, from 
our source.  We use HTML::Mason mostly as a layer of abstraction to mod_perl's 
raw API, and then use HTML::Template to munge our content with our templates in 
pre-release batch mode and/or dynamically.


We keep all of our strings in versioned content files, in XML format, something 
like the following:

    <str id="landHelp.004">
        <content>Here's how to do it:</content>
    </str>
...
    <str id="commChat.217">
        <content>How to participate</content>
    </str>

This is an example of a US English XML string file.  All of the different 
locales we support have their own string files, with the base being the US 
English one, meaning we always translate enUS -> other locale.  So, for the 
traditional Chinese XML file, we would have the equivalent strings for those 
example stringIDs in-between the <content></content> tags, but in Chinese, the 
same for Polish, etc.

Then, a template might look like:

            <option value="-1">------------------</option>
            <option value="-2"><!-- TMPL_VAR NAME=landHelp.004 --></option>

... etc.  This is just HTML::Template syntax inside of a standard HTML 
template.  The post-munging phase would be:

            <option value="-1">------------------</option>
            <option value="-2">Here's how to do it:</option>


So, whenever we read in our string files for munging with templates, we tell 
Perl that the file is UTF-8 formatted, by creating the file handle as such, and 
that's it really; internally Perl automatically treats that string content as 
UTF-8 unless we state otherwise explicitly.  We use the Encode module all the 
time to convert between UTF-8 and Big-5, or ISO-8859-2, or whatever for email 
templates with the same content.  Of course in your web page templates, you 
have to have your character encoding set properly as well as in your emails for 
this all to work with respective clients.  Apache really doesn't care, it just 
sees 8-bit data and serves it to the client.


I also have the following set in the ENVironment of the user running apache, 
but I've completely commented it out and I see no difference in behavior, but I 
keep in there for posterity I guess ... :-)

    # Perl Unicode Support
    # This ENV will force the entire Perl interpreter in Apache to have the
    # following IO layers/streams forced to use UTF-8 as the desired charset.
    # See `perldoc perlrun` and `perldoc peruniintro` for more details.
    # I     1    STDIN is assumed to be in UTF-8
    # O     2    STDOUT will be in UTF-8
    # E     4    STDERR will be in UTF-8
    # S     7    I + O + E
    # i     8    UTF-8 is the default PerlIO layer for input streams
    # o    16    UTF-8 is the default PerlIO layer for output streams
    # D    24    i + o
    # A    32    the @ARGV elements are expected to be strings encoded in UTF-8
    # L    64    normally the "IOEioA" are unconditional,
    #            the L makes them conditional on the locale environment
    #            variables (the LC_ALL, LC_TYPE, and LANG, in the order
    #            of decreasing precedence) -- if the variables indicate
    #            UTF-8, then the selected "IOEioA" are in effect
      PERL_UNICODE=SDA
      export PERL_UNICODE


It's important to understand how Perl deals with character data internally, and 
how it uses the UTF-8 flag it sets, etc.  You should probably read up on it if 
you haven't at the following links:

http://search.cpan.org/~jhi/perl-5.8.0/pod/perluniintro.pod
http://search.cpan.org/~jhi/perl-5.8.0/pod/perlunicode.pod
http://search.cpan.org/~dankogai/Encode-2.18/Encode.pm


Hope this helps you out.
- Jeff



----- Original Message ----
From: Jeff Pang <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, April 4, 2007 7:20:01 PM
Subject: Re: UTF-8 encoding problems under Apache 2 with mod_perl 2.


>
>We also do everything (not source code, which is in ISO-8859-1, only content) 
>in UTF-8 where I >work, and we support many different languages.  

Jeff,how did you do it by using utf-8 for everything?can you give a rough 
description?Thanks.



--
mailto: [EMAIL PROTECTED]
http://home.arcor.de/jeffpang/

Re: UTF-8 encoding problems under Apache 2 with mod_perl 2.

Reply via email to