On Nov 27, 2007, at 3:08 PM, Dan Janowski wrote: > The handling of encoding is not coherent in the extension, as my > last patch on the topic illustrates. While I have no doubt that > there are issues to resolve, in this particular instance I do not > get the result you do. > > Anyone wanting to look at the way encoding is handled is welcome to > make a recommendation.
I just did a few more experiments, it seems I only get this on Mac OS X, it works just fine on FreeBSD and Linux (gentoo). I'll do some more digging to see if I can identify the cause. --Paul > On Nov 27, 2007, at 11:41, Paul Dlug wrote: > >> There is a serious inconsistency when "round tripping" XML containing >> UTF-8 characters. If you output the document to a string after >> parsing >> you get the UTF-8 back out, if you just grab a node and convert to a >> string you get UTF-8 characters substituted with entities: >> >> utf8test.rb: >> >> require 'xml/libxml' >> >> xml = <<XML >> <?xml version="1.0" encoding="UTF-8"?> >> <title>This is a UTF-8 pi: π</title> >> XML >> >> parser = XML::Parser.new >> parser.string = xml >> >> doc = parser.parse >> >> puts doc.to_s >> puts doc.root.to_s >> >> >> This outputs: >> >> <?xml version="1.0" encoding="UTF-8"?> >> <title>This is a UTF-8 pi: π</title> >> <title>This is a UTF-8 pi: π</title> >> >> >> I would think that the behavior of to_s by default would be to write >> the XML out as a string just as it was parsed. Another variant should >> be provided if character conversion is desirable. >> >> >> --Paul >> _______________________________________________ >> libxml-devel mailing list >> libxml-devel@rubyforge.org >> http://rubyforge.org/mailman/listinfo/libxml-devel > > _______________________________________________ > libxml-devel mailing list > libxml-devel@rubyforge.org > http://rubyforge.org/mailman/listinfo/libxml-devel _______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel