I have modified Document#to_s to permit the inclusion of a second  
encoding argument (didn't know there was a first one, eh?). It will  
not change the document encoding, but will case libxml to produce a  
representation of the document in the requested encoding (transcoding  
it if necessary). The default for it is nil, and results in the  
document's encoding.

A few other notes about UTF-16 specifically; UTF-16 will result in a  
two byte lead in, UTF-16BE will not, nor will UTF-16LE. These latter  
encodings are not familiar, but may or may not be of interest.

You were getting two 8bit chars and nothing else because of the  
UTF-16 lead in, but it was also getting truncated because the wrong  
ruby string constructor was being called (which did not use the  
length returned by the libxml dump, so an ^@ was stopping the string).

In other words, it was always broken (I had not previously modified  
this code), now it is less broken.

Dan

On Nov 26, 2007, at 22:38, Dan Janowski wrote:

> I don't have 0.3x on my system anymore, but I do not think UTF16 will
> behave any differently. .to_s is written incorrectly, from what I can
> tell, since it just feeds the encoding of the document back into the
> formatter. But in either case, if you want the as-encoded document,
> you really want to use doc.dump.
>
> Encoding has never worked correctly within the library. It only
> functions properly when fed UTF-8 as I have had to employ Iconv for
> anything else.
>
>
> Dan
>
> On Nov 26, 2007, at 16:05, Tim Perrett wrote:
>
>> Hey Chaps
>>
>> There seems to be some kind of issue with UTF-16 encoding in libxml-
>> ruby version 0.5.2.0.
>>
>> When I do this:
>>
>> doc = XML::Document.new()
>> # doc.encoding = 'utf-16'
>> doc.root = XML::Node.new('root_node')
>> root = doc.root
>> puts doc
>> ## => <?xml version="1.0"?><root_node/>
>>
>> Uncomment the encoding however and you get this:
>>
>> doc = XML::Document.new()
>> doc.encoding = 'utf-16'
>> doc.root = XML::Node.new('root_node')
>> root = doc.root
>> puts doc
>> ## => ÿþ<
>>
>> Any idea whats going on here and how to fix it? The encoding features
>> used to work no problem at all. Im running ruby 1.8.6 (2007-06-07
>> patchlevel 36) [universal-darwin9.0]
>>
>> Cheers
>>
>> Tim
>>
>>
>> _______________________________________________
>> libxml-devel mailing list
>> libxml-devel@rubyforge.org
>> http://rubyforge.org/mailman/listinfo/libxml-devel
>
> _______________________________________________
> libxml-devel mailing list
> libxml-devel@rubyforge.org
> http://rubyforge.org/mailman/listinfo/libxml-devel

_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to