The original problem which I posted to the users list was that gadgets with non 
UTF-8 encodings (I used iso-8859-1 to test) were losing all non ascii 
characters in both the title (metadata call) and content (gadget rendering 
call). 
Details of the problem and solution is as follows:

In BasicRemoteContentFetcher this line:
     $content = mb_convert_encoding($content, 'UTF-8', $charset);
converts the fetched XML as a string to UTF-8 whatever encoding it was in. 
($charset is the source encoding)
But the xml declaration line was not touched. So, after this we may have a 
gadget like this:
<?xml version="1.0" encoding="iso-8859-1"?><Module>  <ModulePrefs 
title="IñtërnâtiônàlizætiønX" />   <Content type="html">     <![CDATA[          
]]>  </Content> </Module>
which is UTF-8 encoded but with an iso-8859-1 encoding attribute.
Later in the call (metadata request or gadget rendering) in 
GadgetSpecParser->parse() we load the XML content into an XML DOM object. At 
this point the error occurs - naturally as the UTF-8 content is flagged as 
being in iso-8859-1.
My fix is as follows:
In BasicRemoteContentFetcher->parseResult replace:
$content = mb_convert_encoding($content, 'UTF-8', $charset);
with 
  $content = mb_convert_encoding($content, 'UTF-8', $charset);  $pattern =  
'encoding=\s*([' . '\'"])' . $charset . '\s*\1';  $content = 
mb_ereg_replace($pattern,'encoding="UTF-8"',$content,"i")  ;
Now the XML is UTF-8 encoded and has the correct UTF-8 encoding attribute.
Justin






                                          
_________________________________________________________________
http://clk.atdmt.com/UKM/go/197222280/direct/01/
Do you have a story that started on Hotmail? Tell us now

Reply via email to