On Nov 27, 2007, at 3:08 PM, Dan Janowski wrote:

> The handling of encoding is not coherent in the extension, as my  
> last patch on the topic illustrates. While I have no doubt that  
> there are issues to resolve, in this particular instance I do not  
> get the result you do.
>
> Anyone wanting to look at the way encoding is handled is welcome to  
> make a recommendation.

I just did a few more experiments, it seems I only get this on Mac OS  
X, it works just fine on FreeBSD and Linux (gentoo). I'll do some more  
digging to see if I can identify the cause.


--Paul

> On Nov 27, 2007, at 11:41, Paul Dlug wrote:
>
>> There is a serious inconsistency when "round tripping" XML containing
>> UTF-8 characters. If you output the document to a string after  
>> parsing
>> you get the UTF-8 back out, if you just grab a node and convert to a
>> string you get UTF-8 characters substituted with entities:
>>
>> utf8test.rb:
>>
>> require 'xml/libxml'
>>
>> xml = <<XML
>> <?xml version="1.0" encoding="UTF-8"?>
>> <title>This is a UTF-8 pi: π</title>
>> XML
>>
>> parser = XML::Parser.new
>> parser.string = xml
>>
>> doc = parser.parse
>>
>> puts doc.to_s
>> puts doc.root.to_s
>>
>>
>> This outputs:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <title>This is a UTF-8 pi: π</title>
>> <title>This is a UTF-8 pi: &#x3C0;</title>
>>
>>
>> I would think that the behavior of to_s by default would be to write
>> the XML out as a string just as it was parsed. Another variant should
>> be provided if character conversion is desirable.
>>
>>
>> --Paul
>> _______________________________________________
>> libxml-devel mailing list
>> libxml-devel@rubyforge.org
>> http://rubyforge.org/mailman/listinfo/libxml-devel
>
> _______________________________________________
> libxml-devel mailing list
> libxml-devel@rubyforge.org
> http://rubyforge.org/mailman/listinfo/libxml-devel

_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to