On Wednesday, Mar 19, 2003, at 13:56 Europe/London, Andrzej Talarczyk wrote:

Hi everyone,

finally, I found some free time to have another look at my problem, described in http://www.mail-archive.com/[EMAIL PROTECTED]/msg02385.html .
In short, I have an iso-8859-2-encoded XSP page that I want to transform with AxKit. It turns out that after the first processing step, which is XSP, the resulting intermediate XML document is marked as "UTF-8" in its header but in fact it is "double-utf-8-encoded".


Having a closer look at the way AxKit::XSP::SAXParser acts, I realized that before the XSP handler() is called, the data is still iso-8859-2 but the $doc structure returned from $parser->parse_fh() is UTF-8. Nevertheless, $doc->getEncoding() returns 'iso-8859-2'. Because of that, process_node() has the idea the data in $doc is iso-8859-2-encoded and applies encodeToUTF8(). This results in "double-utf" conversion.

I assumed in my patch that $doc is always UTF-8 which fixes my problem. Still, I'm not that much into XML::LibXML internals, so I'm not 100% sure that keeping the encoding name as 'iso-8859-2' by $doc is the right way to do it. If it is, then the SAXParser in XSP.pm makes IMHO the wrong assumption on the $doc content.

I'll have to wait for J�rg to weigh in on this one - he's the one who put in all that decode stuff into XSP.pm - I wasn't sure it was required but he convinced me at the time ;-)


Matt.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to