Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)

Paolo Bonzini Fri, 07 Jul 2006 08:59:18 -0700

I DO think that strlen is not for unicode(actually multi-byte encoded case)
string and is bad design: limited to single byte encoding.

I think it's different than this. strlen counts bytes. mbrlen countscharacters. In Smalltalk #size returns allocation units: only if westored everything in UTF-32 (no, UTF-16 would not suffice) would thismean characters.

 I DO think that
modern language should consider unicode like string. I DO think Smalltalk is
MODERN :-)

I do think that modern languages should support Unicode and you're rightthat GNU Smalltalk (mostly) does not. I don't think they should dismisscharacter encodings based on bytes, like UTF-8. These should remain theprimary representation in my opinion, especially if like in UTF-8 youdon't have any problem in finding the first byte of a character (unlikeJIS-0212 or GB-2312) and no need for escape sequences (unlike ISO-2022).


Paolo


_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)

Reply via email to