Actually, Apache has:

AddDefaultCharset ISO-8859-1

And my content-type header meta tag in the HTML output also reads <meta
http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

Header looks like this under regular CGI:

HTTP/1.1.200.OK(CR)(LF)
Date:.Tue,.13.Feb.2007.23:21:28.GMT(CR)(LF)
Server:.Apache(CR)(LF)
Set-Cookie:.publish=;.domain=.lfpress.com;.path=/cgi-bin;.expires=Wed,.14-Fe
b-2007.23:21:29.GMT(CR)(LF)
Connection:.close(CR)(LF)
Transfer-Encoding:.chunked(CR)(LF)
Content-Type:.text/html;.charset=ISO-8859-1(CR)(LF)
(CR)(LF)

And like this under mod_perl:

HTTP/1.1.200.OK(CR)(LF)
Date:.Tue,.13.Feb.2007.23:22:15.GMT(CR)(LF)
Server:.Apache(CR)(LF)
Set-Cookie:.publish=;.domain=.lfpress.com;.path=/cgi-bin;.expires=Wed,.14-Fe
b-2007.23:22:15.GMT(CR)(LF)
Connection:.close(CR)(LF)
Transfer-Encoding:.chunked(CR)(LF)
Content-Type:.text/html;.charset=ISO-8859-1(CR)(LF)
(CR)(LF)

So, it can't be header... this is getting truly bizarre... The system
default charset for the linux box is ISO-8859-1. MySQL is using ISO-8859-1
as its default charset.  Dunno what else to check.

Here's another weird thing - the characters aren't showing up as encoded
entities under either regular CGI or mod_perl - they are actual raw
characters, not escaped or encoded.  Under regular CGI, a caption line shows
up as:

Dennis.Garnhum,.former.mentor/instructor.at.the.National.Theatre.School,.say
s.that.(93)people.who.have.a.burning.passion.that.can(92)t.be.stopped(94).ar
e.most.likely.to.succeed.as.actors.

Under mod_perl, it becomes:

Dennis.Garnhum,.former.mentor/instructor.at.the.National.Theatre.School,.say
s.that.(C2,93)people.who.have.a.burning.passion.that.can(C2,92)t.be.stopped(
C2,94).are.most.likely.to.succeed.as.actors.

BTW I am using an HTTP viewer for this:
http://www.rexswain.com/httpview.html

And the URLs I am using are:
Regular:
http://www.calgarysun.com/cgi-bin/publish.cgi?p=171767&x=articles&s=events
Mod_perl:
http://www.calgarysun.com/perl-bin/publish.cgi?p=171767&x=articles&s=events

It looks like mod_perl is trying to insert double-byte characters where
single-byte characters go.  I checked the ASCII and ISO Latin-1 tables and
those numbers are supposed to be empty charset entities - and yet they're
not.  When I check the ASCII table in my text editor (Ultra-edit), they show
up as characters. Is this the Windows codepage (1251 I think) in action,
extending my ASCII set? And yet, the characters show up under regular CGI in
Fedora fine... for some reason mod_perl just seems to be adding a hex C2
before every non-ASCII character.  Is it an escape sequence messing up
or.... ?

*scratches head 'till the blood comes*

-----Original Message-----
From: Jonathan Vanasco [mailto:[EMAIL PROTECTED] 
Sent: February-08-07 1:42 PM
To: mod_perl List
Subject: Re: Strange characters in output when filtered through mod_perl


Just to clarify:

On Feb 8, 2007, at 3:03 PM, Aaron Hawryluk wrote:

> Our publishing system doesn't use any strange character sets -

Your system is working with data in one character set, and publishing  
it to the web in another character set.  The fix is *likely* just  
setting the right character set header in apache.  Personally, I  
either do everything in UTF8 or ASCII with html entities for  
everything else.

You could try doing:
        AddDefaultCharset utf-8
in httpd.conf

or (i think this will work)
        $r->content_type("text/html; charset=utf-8");
in your handler








// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
| SyndiClick.com
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      FindMeOn.com - The cure for Multiple Web Personality Disorder
|      Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -
|      RoadSound.com - Tools For Bands, Stuff For Fans
|      Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - - - - -

Reply via email to