Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)

Sungjin Chun Fri, 07 Jul 2006 08:15:07 -0700

Hi,

For me, string should not be limited to collection of single byte
characters. String is string not a simple collection of byte, isn't it? I
think squeak's approach (or OpenStep's approach, where abstract public
string class and concrete private subclasses of string that implements
several cases of string). But I'm not currently working hard on GNU
Smalltalk, this may not be the best idea for GNU Smalltalk's case :-)


PS)
I DO think that strlen is not for unicode(actually multi-byte encoded case)
string and is bad design: limited to single byte encoding. I DO think that
modern language should consider unicode like string. I DO think Smalltalk is
MODERN :-)

----- Original Message ----- 
From: "Paolo Bonzini" <[EMAIL PROTECTED]>
To: "Chun Sungjin" <[EMAIL PROTECTED]>
Cc: "GNU Smalltalk" <[email protected]>
Sent: Friday, July 07, 2006 6:17 PM
Subject: Re: {Spam?} Re: [Help-smalltalk] [Q] Unicode String?


> Chun Sungjin wrote:
> > Hi,
> >
> > main problem is that for example, if I did create an instance of
> > string like this;
> >
> > a := 'Some MultiByte Encoded String'.
> >
> > then
> >
> > a size
> >
> > does not answer correct length of string.
> Well, strlen does not in C, too.  You need mbrlen, and #size is more
> like strlen than mbrlen.
>
> Also, the result heavily depends on the chosen character set.  If we
> want to have #utf8Size, that's fine.  But #size should be the number of
> *bytes*, not of characters.
>
> I'm seeing now if I can add an EncodedStream method that extracts
> Unicode characters.  Then what you wanted would be something like
>
>     (EncodedStream wordsOn: 'some string') contents size
>
> for which, of course, we can add a utility method.
>
> Paolo
>



_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)

Reply via email to