When doing force_encoding, convert to a ByteString in the old encoding, then try to convert to an NSString in the new encoding. If we succeed, great. If not, leave as a tagged ByteString (and probably whine about it).

That's actually wrong. All force_encoding does is change the encoding attribute of the string, it shouldn't change the internal encoding of the bytes. The encoding attribute is basically a switch to describe which set of string methods should be used on the bytes.

We have to go through this dance to get force_encoding to play nicely with NSString. Namely, NSString is always backed by an array of UTF-16 code points. So, to reinterpret, we have to convert the internal rep to whatever the external encoding was, then back in, converting to UTF-16 from the new external encoding.

We're in the same hypothetical HTTP library as before, and this library author has decided to
_always_ force encoding to Shift JIS because he hates humanity:

 response = HTTP.get('http://example.com')
 response.body.encoding #=> Encoding::Shift_JIS

If MacRuby internally forces the body encoding to Shift JIS information might get lost. So when
someone decides to make it right afterwards:

encoding = response.header['Content- type'].split(';').last.split('=').last
 encoding #=> 'utf-8'

They might get into trouble here:

 response.body.force_encoding(Encoding::UTF_8)

Cuz'

 Encoding.compatible?(Encoding::Shift_JIS, Encoding::UTF_8) #=> nil

Vincent already answered this part; we’re still doing reinterpretation of what is essentially the original bytestream. Are there any encodings that map multiple sequences to the equivalent code point? (And I’m not talking about Unicode NFC/NFD/&c., that still makes it through the UTF-16 link alright.)

-Ben
_______________________________________________
MacRuby-devel mailing list
MacRuby-devel@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

Reply via email to