Re: UTF8 fun with SOAP::Lite and mod_perl 1.3.33

Drew Wilson Fri, 16 Mar 2007 17:46:00 -0800

FWIW - I did try forcing the page encoding, but this didn't turn outto be necessary as the XML is already utf-8.


Drew


On Mar 16, 2007, at 2:27 PM, Chris Jacobson wrote:

FWIW, if you tell the client to render the page as UTF-8, your'broken' mod_perl version works correctly. The content-type headeris instructing the client to render the page using ISO-8859-1,which will result in gremlin characters being rendered.
Aaron Hawryluk wrote:
This is suspiciously similar to the problem I had with double-bytecharacters coming up where single-byte characters were expected.If you find the answer to this, could you let me know? I stillcan't migrate to mod_perl due to the problem. Mind you I'm onApache2/mp2 so they could be completely unrelated...
Here's a sample of what happens:
Here it is under my old CGI model (which is now far too CPU-intensive):http://www.calgarysun.com/cgi-bin/publish.cgi?p=171082&x=articles&s=showbiz
And here it is under mod_perl:
http://www.calgarysun.com/perl-bin/publish.cgi?p=171082&x=articles&s=showbiz
Hey! Mod_perl guys! Can you say "reproducibility"?
--Aaron Hawryluk
Webmaster, The Calgary Sun
http://www.calgarysun.com
[EMAIL PROTECTED]
Ph: 403-250-4371
-----Original Message-----
From: Drew Wilson [mailto:[EMAIL PROTECTED] Sent: March-16-07 1:15 PM
To: modperl mod_perl
Subject: UTF8 fun with SOAP::Lite and mod_perl 1.3.33
I'm trying to track down a Unicode malcoding problem usingSOAP::Lite 0.67 with mod_perl 1.29 on apache 1.3.33.The problem I'm seeing is my UTF8 strings are transformed in thehttp response.The strings look correct inside the perl space (e.g. printing toSTDERR inside the perl handler) but the strings are converted inthe http packet returned (captured using tcpdump).For example, if I want to send back a string containing theUnicode snowman U2603 (UTF8 E2 98 83), I manually encode thestring as:
            my $snowman = '☃';
my %result = ( 'snowman' => SOAP::Data->type( string=> $snowman ) );
and return it
            return %result;
When watching with tcpdump, I expect to see this UTF8 byte sequence:
         e2 98 83
but instead see
        c3 a2 c2 98 c2 83
I suspect the UTF8 byte sequence is being treated as a UTF 16sequence [00 e2 00 98 00 83], which is then converted to the UTF8equivalent byte sequence [c3 a2 c2 98 c2 83].
But I cannot figure out WHERE this conversion is being done.
Is there any way to trace data being written to the response?
BTW - the $snowman string returns 1 for utf8::is_utf8 andutf8::valid.
Thanks for any suggestions,
Drew
--
____________________________________________________________________
Chris Jacobson                         Phone: (513) 665-9070 x310
Online-Rewards                         Fax  : (214) 242-4448
403 Vine Street, Second Floor          http://www.online-rewards.com
Cincinnati, OH 45202

Re: UTF8 fun with SOAP::Lite and mod_perl 1.3.33

Reply via email to