I DO think that strlen is not for unicode(actually multi-byte encoded case)
string and is bad design: limited to single byte encoding.
I think it's different than this. strlen counts bytes. mbrlen counts characters. In Smalltalk #size returns allocation units: only if we stored everything in UTF-32 (no, UTF-16 would not suffice) would this mean characters.
 I DO think that
modern language should consider unicode like string. I DO think Smalltalk is
MODERN :-)
I do think that modern languages should support Unicode and you're right that GNU Smalltalk (mostly) does not. I don't think they should dismiss character encodings based on bytes, like UTF-8. These should remain the primary representation in my opinion, especially if like in UTF-8 you don't have any problem in finding the first byte of a character (unlike JIS-0212 or GB-2312) and no need for escape sequences (unlike ISO-2022).

Paolo


_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Reply via email to