I finally got around to converting our Apache::ASP application so that it uses UTF-8 throughout, instead of Latin-1. I learned a few things that aren't discussed in the archives, so I'm setting them down here for others to find.

1. It's best if you use newer Perls. 5.8.0 is adequate, but has known bugs in its Unicode handling. When run under 5.8.0, our program exhibits a double UTF-8 conversion in one circumstance, while the other screens show the data correctly. When the same program is run under 5.8.5, all screens show the correct data. While it's theoretically possible to get Perl 5.6.x to cope with UTF-8 data, I don't recommend messing with it. A few years ago when I first tried using UTF-8, I was using 5.6 and had many problems with Perl smashing my data back to Latin-1 incorrectly.

2. Also use the newest mod_perl you can. There are known Unicode bugs in mod_perl 1.99_09 and older.

3. You must say "use utf8;" at the top of each ASP file. If you use $Response->Include(), each included file also has to say "use utf8;". The same goes for any Perl modules you use, if you will be passing UTF-8 strings through them.

4. mod_perl doesn't set the LANG environment variable unless you ask it to. Perls 5.8 and newer use the LANG environment variable (among other things) to decide whether to use UTF-8 by default or not. I didn't find it to be necessary to ask mod_perl to set this variable in my program, but it can't hurt to do it. If nothing else, it's one less thing you have to blame if your pages aren't showing the right data. In your httpd.conf, right after "PerlModule Apache::ASP", say "PerlPassEnv LANG". This will pass your system's default value for LANG through to the mod_perl instances, and thus to Apache::ASP.

5. Ensure that your data source is passing UTF-8 data correctly. In our program, the data comes in via an XML path, so we needed to inform the XML parser that the data is UTF-8. Otherwise, the XML parser assumes it's Latin-1, and you get a double UTF-8 conversion.

6. Finally, you need to communicate that the data is UTF-8 to the browser. This is done with the Content-Type HTTP header, which you can set in a number of ways. I like to do it in a <meta> tag at the top of each file that will contain UTF-8 data:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Alternately, if all documents on your server should be treated as UTF-8, there's an Apache configuration directive to force all output to be declared as UTF-8.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to