Hi,

I tested this and could verify this behavior. I created a patch which
fixes this issue as well as another issue which occurs, when the
content type is not send via an http header but is only present in the
XML declaration.

Please review this at http://codereview.appspot.com/1952044/

Cheers

Bastian

2010/8/18 Paul Lindner <[email protected]>:
> I'm not sure if you read this.  Do you think it's a correct assessment?
> ---------- Forwarded message ----------
> From: Justin Wyllie <[email protected]>
> Date: Tue, Jul 20, 2010 at 4:18 AM
> Subject: Bug in PHP Shindig: non UTF-8 gadgets lose all non asci characters
> To: [email protected]
>
>
>
> The original problem which I posted to the users list was that gadgets with
> non UTF-8 encodings (I used iso-8859-1 to test) were losing all non ascii
> characters in both the title (metadata call) and content (gadget rendering
> call).
> Details of the problem and solution is as follows:
>
> In BasicRemoteContentFetcher this line:
>     $content = mb_convert_encoding($content, 'UTF-8', $charset);
> converts the fetched XML as a string to UTF-8 whatever encoding it was in.
> ($charset is the source encoding)
> But the xml declaration line was not touched. So, after this we may have a
> gadget like this:
> <?xml version="1.0" encoding="iso-8859-1"?><Module>  <ModulePrefs
> title="IñtërnâtiônàlizætiønX" />   <Content type="html">     <![CDATA[
>    ]]>  </Content> </Module>
> which is UTF-8 encoded but with an iso-8859-1 encoding attribute.
> Later in the call (metadata request or gadget rendering) in
> GadgetSpecParser->parse() we load the XML content into an XML DOM object. At
> this point the error occurs - naturally as the UTF-8 content is flagged as
> being in iso-8859-1.
> My fix is as follows:
> In BasicRemoteContentFetcher->parseResult replace:
> $content = mb_convert_encoding($content, 'UTF-8', $charset);
> with
>  $content = mb_convert_encoding($content, 'UTF-8', $charset);  $pattern =
>  'encoding=\s*([' . '\'"])' . $charset . '\s*\1';  $content =
> mb_ereg_replace($pattern,'encoding="UTF-8"',$content,"i")  ;
> Now the XML is UTF-8 encoded and has the correct UTF-8 encoding attribute.
> Justin
>
>
>
>
>
>
>
> _________________________________________________________________
> http://clk.atdmt.com/UKM/go/197222280/direct/01/
> Do you have a story that started on Hotmail? Tell us now
>
>
> --
> Paul Lindner -- [email protected] -- linkedin.com/in/plindner
>

Reply via email to