Hi Tomas, thanks for your two messages and the in-depth explanation. Working with -19 and #encoding: UTF-8 indeed solves the issue (tested on Mono).
> Actually the 1.8 parser is somewhat influenced by the current $KCODE. > Multi-byte characters could be part of identifiers and also the decision of > where a string literal ends needs to deal with multi-byte characters. > > However, the resulting literals are just plain byte arrays with no knowledge > of encoding so String#size method is still broken. > > To achieve a better .NET interop in IronRuby, we will honor KCODE when > creating MutableStrings. The representation of the string will be byte[] if > it contains any non-ascii characters and KCODE is set to a non-ascii > encoding. We will also attach the KCODE encoding to the MutableString at > creation time. This doesn’t affect Ruby 1.8 functionality, it only affects > conversions to CLR string. So if you use KCODE = “U” the CLR strings should > be correctly encoded (they are not now as you are experiencing). I’ll > implement this feature as soon as possible. I think affecting strings only when conversion occurs to CLR is a pretty neat idea. I like that a lot more than having to add #encoding and -19 (also because I'm not sure what the impact would be to use -19 just for that). Because I was curious, I had a look at Rails (2.2.2) output for some of these operations: Loading development environment (Rails 2.2.2) "hèllo".size>> "hèllo".size => 6 >> "hèllo".chars => #<ActiveSupport::Multibyte::Chars:0x2378348 @wrapped_string="hèllo"> >> "hèllo".chars.size => 5 >> '€2.99'[0,1] => "\342" >> '€2.99'.first => "€" >> '€2.99'.first => "€" So pretty much rough access through array is pure byte, while .first takes multibytes into account. I think the spirit of what you suggest is somewhat close from that. I like it - and will test it when you'll have it implemented. cheers and thanks for your idea, -- Thibaut _______________________________________________ Ironruby-core mailing list [email protected] http://rubyforge.org/mailman/listinfo/ironruby-core
