Your message dated Wed, 13 Feb 2008 15:17:36 +0200
with message-id <[EMAIL PROTECTED]>
and subject line Re: Bug#254784: libxml-libxml-perl: toString doesn't return 
the data in the document encoding
has caused the Debian Bug report #254784,
regarding libxml-libxml-perl: toString doesn't return the data in the document 
encoding
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [EMAIL PROTECTED]
immediately.)


-- 
254784: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254784
Debian Bug Tracking System
Contact [EMAIL PROTECTED] with problems
--- Begin Message ---
Package: libxml-libxml-perl
Version: 1.56-6
Severity: normal

The XML::LibXML::Document(3pm) man page says:

   NOTE: XML::LibXML::Document::toString returns the data in the docu-
   ment encoding rather than UTF8! If you want UTF8 ecoded XML, you
   have to change the conding by using setEncoding()

But the following example shows that toString returns the data in
ASCII, instead of the document encoding.

XML file tst.xml:

<?xml version="1.0" encoding="iso-8859-1"?>
<root>abcdéè</root>

Perl script tostring:

#!/usr/bin/env perl

use strict;
use XML::LibXML;

my $file = shift;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my @nodes = $doc->findnodes("/root");
@nodes == 1 or die;
my $string = $nodes[0]->toString(0);
print "--> $string <--\n";

$ ./tostring tst.xml
--> <root>abcd&#xE9;&#xE8;</root> <--

-- System Information:
Debian Release: testing/unstable
  APT prefers testing
  APT policy: (900, 'testing'), (200, 'unstable')
Architecture: powerpc (ppc)
Kernel: Linux 2.4.18-newpmac
Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1

Versions of packages libxml-libxml-perl depends on:
ii  libc6                       2.3.2.ds1-12 GNU C Library: Shared libraries an
ii  libxml-libxml-common-perl   0.13-4       Perl module for common routines & 
ii  libxml-namespacesupport-per 1.08-3       Perl module for supporting simple 
ii  libxml-sax-perl             0.12-4       Perl module for using and building
ii  libxml2                     2.6.10-3     GNOME XML library
ii  perl                        5.8.3-3      Larry Wall's Practical Extraction 
ii  perl-base [perlapi-5.8.2]   5.8.3-3      The Pathologically Eclectic Rubbis
ii  zlib1g                      1:1.2.1.1-3  compression library - runtime

-- no debconf information


--- End Message ---
--- Begin Message ---
Version: 1.66-1

On Thu, Aug 03, 2006 at 06:26:35PM +0200, Vincent Lefevre wrote:

> On 2006-08-03 17:52:12 +0200, Florian Ragwitz wrote:
> > In my understanding of the documentation XML::LibXML behaves correct.
> > 
> > Your test code call's the toString method on a XML::LibXML::Element,
> > which ISA XML::LibXML::Node.
> > 
> > XML::LibXML::Node's documentation says WRT toString:
> > 
> >   $xmlstring = $node->toString( $format, $docencoding );
> > 
> >   Additionally to the $format flag of XML::LibXML::Document, this
> >   version accepts the optional $docencoding flag. If this flag is set
> >   this function returns the string in its original encoding (the
> >   encoding of the document) rather than UTF-8.
> > 
> > So the default encoding for XML::LibXML::Node::toString is UTF-8,
> > unless specified otherwise.
> 
> Hmm... There *was* a bug since I got ASCII instead of UTF-8 anyway.
> I've tested my example with my the new version, and I now get the
> output string in iso-8859-1, whether the document is in iso-8859-1
> or in utf-8, and whether my locales are iso-8859-1 or utf-8.

The documentation was fixed in 1.66, and now reads:

  $xmlstring = $node->toString($format,$docencoding);

  This method is similar to the method "toString" of a XML::LibXML DOM
  Document Class but for a single node. It returns a string consisting of
  XML serialization of the given node and all its descendants. Unlike
  "XML::LibXML::Document::toString", in this case the resulting
  string is by default a character string (UTF-8 encoded with UTF8
  flag on). An optional flag $format controls indentation, as in
  "XML::LibXML::Document::toString". If the second optional $docencoding
  flag is true, the result will be a byte string in the document encoding
  (see "XML::LibXML::Document::actualEncoding").
  
As the return value is a Perl character string, it's printed in latin1
by default. Using eg. binmode(STDOUT, ":utf8") gives UTF-8 output,
as explained in perlunicode(1).

I'm closing the bug accordingly. Hope this is satisfactory.

Cheers,
-- 
Niko Tyni   [EMAIL PROTECTED]


--- End Message ---

Reply via email to