Hi, For me, string should not be limited to collection of single byte characters. String is string not a simple collection of byte, isn't it? I think squeak's approach (or OpenStep's approach, where abstract public string class and concrete private subclasses of string that implements several cases of string). But I'm not currently working hard on GNU Smalltalk, this may not be the best idea for GNU Smalltalk's case :-)
PS) I DO think that strlen is not for unicode(actually multi-byte encoded case) string and is bad design: limited to single byte encoding. I DO think that modern language should consider unicode like string. I DO think Smalltalk is MODERN :-) ----- Original Message ----- From: "Paolo Bonzini" <[EMAIL PROTECTED]> To: "Chun Sungjin" <[EMAIL PROTECTED]> Cc: "GNU Smalltalk" <[email protected]> Sent: Friday, July 07, 2006 6:17 PM Subject: Re: {Spam?} Re: [Help-smalltalk] [Q] Unicode String? > Chun Sungjin wrote: > > Hi, > > > > main problem is that for example, if I did create an instance of > > string like this; > > > > a := 'Some MultiByte Encoded String'. > > > > then > > > > a size > > > > does not answer correct length of string. > Well, strlen does not in C, too. You need mbrlen, and #size is more > like strlen than mbrlen. > > Also, the result heavily depends on the chosen character set. If we > want to have #utf8Size, that's fine. But #size should be the number of > *bytes*, not of characters. > > I'm seeing now if I can add an EncodedStream method that extracts > Unicode characters. Then what you wanted would be something like > > (EncodedStream wordsOn: 'some string') contents size > > for which, of course, we can add a utility method. > > Paolo > _______________________________________________ help-smalltalk mailing list [email protected] http://lists.gnu.org/mailman/listinfo/help-smalltalk
