If I instruct the browser to render to UTF-8, The strange characters disappear, but the proper characters don't show up - instead I get the gap indicative of a non-rendering character or nothing at all, depending on the browser (IE and FF do different things here - big surprise).
The problem as I see it is that the sytem locale is set to ISO-8859-1 (and mysql should be using the system locale), apache is set to ISO-8859-1, and yet for some reason UTF-8 (possibly - not necessarily - just double-byte instead of single-byte) is coming out of mod_perl where regular cgi is just pumping out (normal) ISO-8859-1. Switching system locales might have some effect, so I'll test that on a development machine and see what happens. Here goes nothing... -----Original Message----- From: Chris Jacobson [mailto:[EMAIL PROTECTED] Sent: March-16-07 3:27 PM To: Aaron Hawryluk Cc: 'Drew Wilson'; 'modperl mod_perl' Subject: Re: UTF8 fun with SOAP::Lite and mod_perl 1.3.33 FWIW, if you tell the client to render the page as UTF-8, your 'broken' mod_perl version works correctly. The content-type header is instructing the client to render the page using ISO-8859-1, which will result in gremlin characters being rendered. Aaron Hawryluk wrote: > This is suspiciously similar to the problem I had with double-byte characters > coming up where single-byte characters were expected. If you find the answer > to this, could you let me know? I still can't migrate to mod_perl due to the > problem. Mind you I'm on Apache2/mp2 so they could be completely unrelated... > > Here's a sample of what happens: > > Here it is under my old CGI model (which is now far too CPU-intensive): > http://www.calgarysun.com/cgi-bin/publish.cgi?p=171082&x=articles&s=showbiz > > And here it is under mod_perl: > http://www.calgarysun.com/perl-bin/publish.cgi?p=171082&x=articles&s=showbiz > > Hey! Mod_perl guys! Can you say "reproducibility"? > > > --Aaron Hawryluk > Webmaster, The Calgary Sun > http://www.calgarysun.com > [EMAIL PROTECTED] > Ph: 403-250-4371 > > > -----Original Message----- > From: Drew Wilson [mailto:[EMAIL PROTECTED] > Sent: March-16-07 1:15 PM > To: modperl mod_perl > Subject: UTF8 fun with SOAP::Lite and mod_perl 1.3.33 > > I'm trying to track down a Unicode malcoding problem using SOAP::Lite > 0.67 with mod_perl 1.29 on apache 1.3.33. > > The problem I'm seeing is my UTF8 strings are transformed in the http > response. > > The strings look correct inside the perl space (e.g. printing to > STDERR inside the perl handler) but the strings are converted in the > http packet returned (captured using tcpdump). > > For example, if I want to send back a string containing the Unicode > snowman U2603 (UTF8 E2 98 83), I manually encode the string as: > my $snowman = '☃'; > my %result = ( 'snowman' => SOAP::Data->type( string => > $snowman ) ); > > and return it > return %result; > > When watching with tcpdump, I expect to see this UTF8 byte sequence: > e2 98 83 > but instead see > c3 a2 c2 98 c2 83 > > I suspect the UTF8 byte sequence is being treated as a UTF 16 > sequence [00 e2 00 98 00 83], which is then converted to the UTF8 > equivalent byte sequence [c3 a2 c2 98 c2 83]. > > But I cannot figure out WHERE this conversion is being done. > > Is there any way to trace data being written to the response? > > BTW - the $snowman string returns 1 for utf8::is_utf8 and utf8::valid. > > Thanks for any suggestions, > > Drew > > > -- ____________________________________________________________________ Chris Jacobson Phone: (513) 665-9070 x310 Online-Rewards Fax : (214) 242-4448 403 Vine Street, Second Floor http://www.online-rewards.com Cincinnati, OH 45202
