That's a very interesting (puzzling, frustrating) problem. I suspect it is
related to Perl's "utf8 flag". From the perlunicode man page:
> By default, there is a fundamental asymmetry in Perl's Unicode model:
implicit upgrading from byte strings to Unicode strings assumes that they
were encoded in *ISO 8859-1 (Latin-1)*, but Unicode strings are downgraded
with UTF-8 encoding. This happens because the first 256 codepoints in
Unicode happens to agree with Latin-1.
I infer from the behavior that Mason does not specify its output to be
Unicode, thus it is treated as a byte string. XML::DOM, OTOH, almost
certainly specifies Unicode strings (required by the XML spec). By printing
XML::DOM's output, you force Mason's buffer to upgrade to Unicode. As quoted
above, Perl assumes in this circumstance that the string is ISO 8859-1.
Since that is not the case, high characters are mis-translated.
All that said, I don't have much experience with Unicode, and I don't know
how to *fix* this. Maybe you want to ['use encoding "windows-1251";'][1]?
I recommend you read through [perlunicode][2] and maybe the [Encode man
page][3]. If you do get it worked out, please post your solution to the
list, I would love to know how you fixed it.
[1]: http://perldoc.perl.org/encoding.html
[2]: http://perldoc.perl.org/perlunicode.html
[3]: http://perldoc.perl.org/Encode.html
Good luck,
Vince Veselosky
http://www.webquills.net
2008/3/18 Konstantin Stroikovsky <[EMAIL PROTECTED]>:
> Hi,
>
> I have a problem while using any national charsets and handling XML's
> with XML::DOM.
> Let's see some program:
> #############################################
> <%init>
> use XML::DOM;
>
> print "ббб\n"; # Someone in non-english
> charset
>
> my $s = '<?xml version="1.0"
> encoding="windows-1251"?><TOVAR></TOVAR>';
>
>
> my $parser = new XML::DOM::Parser;
> my $doc = $parser->parse($s);
> my $root = $doc->getDocumentElement();
> my $a = $root->getTagName();
> # print $a if ($a eq 'TOVAR'); #
> !!! THE REAL MAGIC HERE !!!
>
>
>
> </%init>
> <%flags>
> inherit => undef
> </%flags>
> ##############################################
>
> Server answer:
>
> --------------------------------------------------------------------------------
> [EMAIL PROTECTED]:~]$ telnet xxxxx.ru 80
> Trying xx.xx.xx.xx...
> Connected to xxxxx.ru.
> Escape character is '^]'.
> GET /test.html HTTP/1.1
> Host: xxxxx.ru
>
> HTTP/1.1 200 OK
> Date: Tue, 18 Mar 2008 13:50:42 GMT
> Server: Apache/1.3.37 (Unix) mod_perl/1.30 PHP/5.2.3 with Suhosin-Patch
> Pragma: no-cache
> Cache-control: no-cache
> Expires: Thu, 01 Jan 1970 00:00:01 GMT
> Transfer-Encoding: chunked
> Content-Type: text/html; charset=windows-1251
> Content-Language: ru
>
> 4
> ббб <- All looks fine !!!
>
> 0
> Connection closed by foreign host.
> [EMAIL PROTECTED]:~]$
>
> --------------------------------------------------------------------------------
>
>
> Now remove remark char (#) before last print. Server answer:
>
> --------------------------------------------------------------------------------
> [EMAIL PROTECTED]:~]$ telnet xxxxx.ru 80
> Trying xx.xx.xx.xx...
> Connected to xxxxx.ru.
> Escape character is '^]'.
> GET /test.html HTTP/1.1
> Host: xxxxx.ru
>
> HTTP/1.1 200 OK
> Date: Tue, 18 Mar 2008 13:57:47 GMT
> Server: Apache/1.3.37 (Unix) mod_perl/1.30 PHP/5.2.3 with Suhosin-Patch
> Pragma: no-cache
> Cache-control: no-cache
> Expires: Thu, 01 Jan 1970 00:00:01 GMT
> Transfer-Encoding: chunked
> Content-Type: text/html; charset=windows-1251
> Content-Language: ru
>
> c
> ццц <- Broken characters
> !!!
> TOVAR
> 0
>
> Connection closed by foreign host.
> [EMAIL PROTECTED]:~]$
>
> --------------------------------------------------------------------------------
> Whats may be wrong? How to solve?
> (Program works right under plain mod-perl.)
>
>
> --------------------------------------------------------------------------------
> [EMAIL PROTECTED]:~]$ pkg_info | grep Mason
> bsdpan-Syntax-Highlight-Mason-1.21 Syntax::Highlight::Mason - Perl
> extension to Highlight HTML
> p5-HTML-Mason-1.35 High-performance, dynamic web site authoring system
> [EMAIL PROTECTED]:~]$ pkg_info | grep apache
> apache-1.3.37_4 The extremely popular Apache http server. Very fast,
> very c
> [EMAIL PROTECTED]:~]$ perl -v
>
> This is perl, v5.8.8 built for i386-freebsd-64int
>
> --------------------------------------------------------------------------------
>
> Thanks!
> --
> Konstantin Stroikovski
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> Mason-users mailing list
> Mason-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mason-users
>
--
-Vince Veselosky
http://www.webquills.net
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Mason-users mailing list
Mason-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mason-users