Andrzej Talarczyk wrote:

I have successfully used AxKit for several months on my Debian Linux
box. Recently, I have decided to migrate one of the sites to a FreeBSD
box and I have stumped by an unexpected problem. I use AxKit to generate
HTML files from XML source encoded in iso-8859-2 in the processing
chains that looks like this:

file.xml (iso-8859-2) -> XSP -> XSLT -> XSLT -> HTML (iso-8859-2)

On Debian, intermediate xml structures (as seen with
AxTraceIntermediate) are UTF-8-encoded and final HTML output is properly
converted to iso-8859-2 by the second XSLT stylesheet.

On FreeBSD, however, the first intermediate (after XSP processing) looks
as if there was an "additional" (unnecessary) conversion step from
iso-8859-2 to UTF-8: e.g iso-8859-2 code is converted to UTF-8 (two
codes) and then each of them is converted once again, resulting 4 codes
from a single iso code. Needless to say, that the final HTML page is
mangled, too.
I've had a look at the package AxKit::XSP::SAXParser in XSP.pm and I've noticed that the difference between my two boxes is in the function process_node(). The string this function gets as a parameter is already UTF-8-encoded but it is further processed by encodeToUTF8(). On the Debian box encodeToUTF8() does nothing, i.e. the input string is identical to the output string. On the FreeBSD box, however, the input string (already UTF-8) is converted once again, resulting in garbage where non-ASCII characters were present. I made a quick and dirty patch which fixes my problem by removing encodeToUTF-8() call in this place, but I'm not sure if this doesn't break something else. Unfortunately, my time resources are limited at the moment to perform a more thorough query but I'm posting this info here in case someone is interested/has a similar problem.

--
Andrzej

*** XSP.pm_ORG  Mon Jan 20 13:33:34 2003
--- XSP.pm      Mon Jan 20 13:37:39 2003
***************
*** 970,976 ****
      }
      elsif ($node_type == XML_TEXT_NODE || $node_type == XML_CDATA_SECTION_NODE) {
          # warn($node->getData . "\n");
!           $handler->characters( { Data => encodeToUTF8($encoding,$node->getData()) } 
);
      }
      elsif ($node_type == XML_ELEMENT_NODE) {
          # warn("<" . $node->getName . ">\n");
--- 970,976 ----
      }
      elsif ($node_type == XML_TEXT_NODE || $node_type == XML_CDATA_SECTION_NODE) {
          # warn($node->getData . "\n");
!            $handler->characters( { Data => $node->getData() } );
      }
      elsif ($node_type == XML_ELEMENT_NODE) {
          # warn("<" . $node->getName . ">\n");

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to