On Nov 14, 2009, at 15:44 , MacRuby wrote:

> #339: YAML error with UTF-16 string
> ---------------------------+------------------------------------------------
> Reporter:  d...@…          |        Owner:  lsansone...@…        
>     Type:  defect         |       Status:  closed               
> Priority:  critical       |    Milestone:  MacRuby 0.5          
> Component:  MacRuby        |   Resolution:  fixed                
> Keywords:  YAML encoding  |  
> ---------------------------+------------------------------------------------
> 
> Comment(by jazz...@…):
> 
> {{{
> $ macruby -e 'require "yaml"; puts "Rübe".to_yaml'
> --- "R\xFCbe"
> $ ruby1.9 -e 'require "yaml"; puts "Rübe".to_yaml'
> --- "R\xC3\xBCbe"
> }}}
> 
> seems to work now! Macruby escpapes to UTF-16 and Ruby1.9 escapes to
> UTF-8.

Actually, it seems to me (though I'm willing to be corrected on this), that the 
ruby1.9 encoding is simply wrong: It translates the accented character into 
UTF-8, and then escapes the two UTF-8 characters separately. What this ends up 
encoding is "Rübe", which is not what you want.

> I didn't find anything in YAML docs that describes that behaviour, both 
> methods seem to be correct.

They can't possibly be BOTH correct, as interpreting the output of one 
according to the theory of the other would give a different result. If you look 
at the section in the YAML spec: 
<http://www.yaml.org/spec/1.2/spec.html#id2776092>, you will see 

        [57] "Escaped 8-bit Unicode character."

This is NOT an UTF-8 character.

> But ruby 1.8 fails to load the UTF-16 YAML. That is not astonishing because 
> IMHO there is now way to guess what is the correct escaping mode.

It's not astonishing because (a) 1.8 has very poor Unicode support anyway and 
(b) this would hardly be the only bug in syck.

> I think escaping is not necessary here because the encoding of input and
> output is the same. This can easly be tested by
> 
> {{{
> $ macruby -e 'require "yaml"; puts YAML::load "--- Rübe"'
> Rübe
> }}}

That's an interesting point. I think you're right that the YAML spec does not 
require escaping of printable characters >\u007F. However, non-printable 
characters DO have to be escaped, and for the printable ones, it could be 
argued that erring on the side of escaping helps readability if the OS does not 
have font coverage for some printable characters. In any case, the current 
implementation tries to be conservative in what it generates and liberal in 
what it accepts. I'm open to persuasion that we should avoid escaping 
characters, provided there is a low-cost test for printability of general 
Unicode characters (I have not yet checked whether one of the built-in 
CFCharacterSets can give that; the descriptions were inconclusive).

Matthias
 
_______________________________________________
MacRuby-devel mailing list
MacRuby-devel@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

Reply via email to