UTF-8 fun [was: UTF8 fun with SOAP::Lite and mod_perl 1.3.33]

Aaron Hawryluk Fri, 16 Mar 2007 14:27:25 -0800

If I instruct the browser to render to UTF-8, The strange characters disappear, 
but the proper characters don't show up - instead I get the gap indicative of a 
non-rendering character or nothing at all, depending on the browser (IE and FF 
do different things here - big surprise).


The problem as I see it is that the sytem locale is set to ISO-8859-1 (and 
mysql should be using the system locale), apache is set to ISO-8859-1, and yet 
for some reason UTF-8 (possibly - not necessarily - just double-byte instead of 
single-byte) is coming out of mod_perl where regular cgi is just pumping out 
(normal) ISO-8859-1.  Switching system locales might have some effect, so I'll 
test that on a development machine and see what happens.  Here goes nothing...

-----Original Message-----
From: Chris Jacobson [mailto:[EMAIL PROTECTED] 
Sent: March-16-07 3:27 PM
To: Aaron Hawryluk
Cc: 'Drew Wilson'; 'modperl mod_perl'
Subject: Re: UTF8 fun with SOAP::Lite and mod_perl 1.3.33

FWIW, if you tell the client to render the page as UTF-8, your 'broken' 
mod_perl version works correctly.  The content-type header is 
instructing the client to render the page using ISO-8859-1, which will 
result in gremlin characters being rendered.

Aaron Hawryluk wrote:
> This is suspiciously similar to the problem I had with double-byte characters 
> coming up where single-byte characters were expected.  If you find the answer 
> to this, could you let me know?  I still can't migrate to mod_perl due to the 
> problem. Mind you I'm on Apache2/mp2 so they could be completely unrelated...
> 
> Here's a sample of what happens:
> 
> Here it is under my old CGI model (which is now far too CPU-intensive):
> http://www.calgarysun.com/cgi-bin/publish.cgi?p=171082&x=articles&s=showbiz
> 
> And here it is under mod_perl:
> http://www.calgarysun.com/perl-bin/publish.cgi?p=171082&x=articles&s=showbiz
> 
> Hey! Mod_perl guys! Can you say "reproducibility"?
> 
> 
> --Aaron Hawryluk
> Webmaster, The Calgary Sun
> http://www.calgarysun.com
> [EMAIL PROTECTED]
> Ph: 403-250-4371
> 
> 
> -----Original Message-----
> From: Drew Wilson [mailto:[EMAIL PROTECTED] 
> Sent: March-16-07 1:15 PM
> To: modperl mod_perl
> Subject: UTF8 fun with SOAP::Lite and mod_perl 1.3.33
> 
> I'm trying to track down a Unicode malcoding problem using SOAP::Lite  
> 0.67 with mod_perl 1.29 on apache 1.3.33.
> 
> The problem I'm seeing is my UTF8 strings are transformed in the http  
> response.
> 
> The strings look correct inside the perl space (e.g. printing to  
> STDERR inside the perl handler) but the strings are converted in the  
> http packet returned (captured using tcpdump).
> 
> For example, if I want to send back a string containing the Unicode  
> snowman U2603 (UTF8 E2 98 83), I manually encode the string as:
>             my $snowman = '☃';
>             my %result = ( 'snowman' => SOAP::Data->type( string =>  
> $snowman  ) );
> 
> and return it
>             return %result;
> 
> When watching with tcpdump, I expect to see this UTF8 byte sequence:
>        e2 98 83
> but instead see
>       c3 a2 c2 98 c2 83
> 
> I suspect the UTF8 byte sequence is being treated as a UTF 16  
> sequence [00 e2 00 98 00 83], which is then converted to the UTF8  
> equivalent byte sequence [c3 a2 c2 98 c2 83].
> 
> But I cannot figure out WHERE this conversion is being done.
> 
> Is there any way to trace data being written to the response?
> 
> BTW - the $snowman string returns 1 for utf8::is_utf8 and utf8::valid.
> 
> Thanks for any suggestions,
> 
> Drew
> 
> 
> 

-- 
____________________________________________________________________
Chris Jacobson                         Phone: (513) 665-9070 x310
Online-Rewards                         Fax  : (214) 242-4448
403 Vine Street, Second Floor          http://www.online-rewards.com
Cincinnati, OH 45202

UTF-8 fun [was: UTF8 fun with SOAP::Lite and mod_perl 1.3.33]

Reply via email to