Re: [Help-smalltalk] [Q] Unicode String?

Paolo Bonzini Fri, 07 Jul 2006 00:03:32 -0700

Chun Sungjin wrote:

Hi,
I've tried GNU smalltalk and for me it seems good. But I have aproblem: current implementation does not support Unicode. It seemsthat it only supports single byte character only. I've also triedsqueak, which seems less faster than GNU smalltalk - I'm not sure onthis, this might not be correct - has unicode compatible stringimplementation and I think this kind of approach is good. Is there anychange to have unicode compatible string implementation in nextversion of GNU smalltalk?

What do you need exactly? The main missing thing is support forCharacter objects with values above 256. However if you are contentwith multibyte character sets like UTF-8, or with Unicode charactercodes, that's fine.

For character set translation, if you load the I18N package, GNUSmalltalk gets an iconv wrapper. The main method you need isEncodedStream>>#on:from:to: (e.g. on: 'abc' from: 'UTF-8' to: 'UCS-4').

To extract Unicode character codes from an UCS-4LE encoded string, youcan use (ByteStream on: x asByteArray) and send nextLong. Forbig-endian, there is no class but I was thinking of adding a #bigEndianmethod to ByteStream for the next version.


Things that could be useful are
   Integer>>#asUTF8String
   String class>>#utf8FromCodepoint: (same as above)
   String>>#utf8Stream
   UTF8Stream (returns Unicode character codes)
   ... (tell me what you need) ...

Paolo


_______________________________________________
help-smalltalk mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: [Help-smalltalk] [Q] Unicode String?

Reply via email to