Setting $KCODE = "U" doesn't actually affect the encoding of the literal in the 
same compilation unit. It only affects literals that are parsed after the KCODE 
is set.

$KCODE = "U"
x = "日本語"
p x.Encoding                    # => ASCII-8BIT since the current compilation 
unit (a file) was parsed using BINARY encoding
p x.size                        # => 9 bytes

y = eval('"日本語"')
p y.Encoding                    # => KCODE: UTF8
p y.size                                # => 9 since String#size in MRI 1.8.6 
doesn't understand encodings, it counts in bytes

c = x.to_clr_string             # this is essentially creating a string whose 
non ASCII characters are not correctly encoded in UTF8 (they are UTF8 bytes 
widened to 16bits)
p c.size                                # => 9 characters
p c.Encoding                    # => UTF-8 since CLR string doesn't hold on an 
encoding. When you ask for its bytes we need to use some encoding. 
                                # Maybe we could choose UTF16 but MRI 1.8.6 has 
at least some support for.

d = y.to_clr_string             # correctly encoded string
d c.Encoding                    # UTF-8
p d.size                                # 3 characters

Encodings in 1.8.6 are not very well supported and it is difficult to implement 
good interop between CLR and MRI strings. It would get better in the next 
version of IronRuby which will target compatibility with 1.9.

Tomas

-----Original Message-----
From: ironruby-core-boun...@rubyforge.org 
[mailto:ironruby-core-boun...@rubyforge.org] On Behalf Of Daniele Alessandri
Sent: Monday, March 15, 2010 1:48 PM
To: ironruby-core@rubyforge.org
Subject: [Ironruby-core] $KCODE, -KU and CLR strings

Hi everyone,
please consider this snippet:

$KCODE = "U"
puts "日本語".to_clr_string.length

When I run it by launching ir.exe without any option I get 9 as an output (each 
character in that string is actually made up of 3 bytes with UTF-8 encoding), 
and when I do the same with the -KU option being passed to ir.exe I get 3. 
Aside from the fact that I think that 3 is to be considered the right behaviour 
here, shouldn't the sole $KCODE = "U" have the same effect of starting ir.exe 
with the -KU option?

Thanks,
Daniele

--
Daniele Alessandri
http://www.clorophilla.net/
http://twitter.com/JoL1hAHN
_______________________________________________
Ironruby-core mailing list
Ironruby-core@rubyforge.org
http://rubyforge.org/mailman/listinfo/ironruby-core
_______________________________________________
Ironruby-core mailing list
Ironruby-core@rubyforge.org
http://rubyforge.org/mailman/listinfo/ironruby-core

Reply via email to