Well,
We completely separate out our content, from our presentation templates, from
our source. We use HTML::Mason mostly as a layer of abstraction to mod_perl's
raw API, and then use HTML::Template to munge our content with our templates in
pre-release batch mode and/or dynamically.
We keep all of our strings in versioned content files, in XML format, something
like the following:
<str id="landHelp.004">
<content>Here's how to do it:</content>
</str>
...
<str id="commChat.217">
<content>How to participate</content>
</str>
This is an example of a US English XML string file. All of the different
locales we support have their own string files, with the base being the US
English one, meaning we always translate enUS -> other locale. So, for the
traditional Chinese XML file, we would have the equivalent strings for those
example stringIDs in-between the <content></content> tags, but in Chinese, the
same for Polish, etc.
Then, a template might look like:
<option value="-1">------------------</option>
<option value="-2"><!-- TMPL_VAR NAME=landHelp.004 --></option>
... etc. This is just HTML::Template syntax inside of a standard HTML
template. The post-munging phase would be:
<option value="-1">------------------</option>
<option value="-2">Here's how to do it:</option>
So, whenever we read in our string files for munging with templates, we tell
Perl that the file is UTF-8 formatted, by creating the file handle as such, and
that's it really; internally Perl automatically treats that string content as
UTF-8 unless we state otherwise explicitly. We use the Encode module all the
time to convert between UTF-8 and Big-5, or ISO-8859-2, or whatever for email
templates with the same content. Of course in your web page templates, you
have to have your character encoding set properly as well as in your emails for
this all to work with respective clients. Apache really doesn't care, it just
sees 8-bit data and serves it to the client.
I also have the following set in the ENVironment of the user running apache,
but I've completely commented it out and I see no difference in behavior, but I
keep in there for posterity I guess ... :-)
# Perl Unicode Support
# This ENV will force the entire Perl interpreter in Apache to have the
# following IO layers/streams forced to use UTF-8 as the desired charset.
# See `perldoc perlrun` and `perldoc peruniintro` for more details.
# I 1 STDIN is assumed to be in UTF-8
# O 2 STDOUT will be in UTF-8
# E 4 STDERR will be in UTF-8
# S 7 I + O + E
# i 8 UTF-8 is the default PerlIO layer for input streams
# o 16 UTF-8 is the default PerlIO layer for output streams
# D 24 i + o
# A 32 the @ARGV elements are expected to be strings encoded in UTF-8
# L 64 normally the "IOEioA" are unconditional,
# the L makes them conditional on the locale environment
# variables (the LC_ALL, LC_TYPE, and LANG, in the order
# of decreasing precedence) -- if the variables indicate
# UTF-8, then the selected "IOEioA" are in effect
PERL_UNICODE=SDA
export PERL_UNICODE
It's important to understand how Perl deals with character data internally, and
how it uses the UTF-8 flag it sets, etc. You should probably read up on it if
you haven't at the following links:
http://search.cpan.org/~jhi/perl-5.8.0/pod/perluniintro.pod
http://search.cpan.org/~jhi/perl-5.8.0/pod/perlunicode.pod
http://search.cpan.org/~dankogai/Encode-2.18/Encode.pm
Hope this helps you out.
- Jeff
----- Original Message ----
From: Jeff Pang <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, April 4, 2007 7:20:01 PM
Subject: Re: UTF-8 encoding problems under Apache 2 with mod_perl 2.
>
>We also do everything (not source code, which is in ISO-8859-1, only content)
>in UTF-8 where I >work, and we support many different languages.
Jeff,how did you do it by using utf-8 for everything?can you give a rough
description?Thanks.
--
mailto: [EMAIL PROTECTED]
http://home.arcor.de/jeffpang/