Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

Alaric Snell-Pym Tue, 18 Mar 2008 04:22:06 -0700


On 18 Mar 2008, at 2:29 am, Alex Shinn wrote:

The problems we're having aren't even about string
representation though, they're about the semantics of the
string operations themselves.  Are the string indices byte
positions or character positions?  Different libraries
disagree.



IMHO Java does it more or less right (falls down on the details,
though; tends to assume that one UTF16 code = 1 character, sigh).

As in, you have a byte type, and a char type, and never the twain
shall meet, except that String (a wrapper around a char array with
stringy operations defined) has an encode method that takes an
encoding name and returns a byte array, and a constructor that takes
a byte array and an encoding name. There's versions, too, that don't
take an encoding name, and then use the "platform default
encoding" (eg, on UNIX, it looks up the locale and works from that).

So when you read from a file, you get bytes, but if you ask, they'll
be converted to characters, etc.

ABS

--
Alaric Snell-Pym
Work: http://www.snell-systems.co.uk/
Play: http://www.snell-pym.org.uk/alaric/
Blog: http://www.snell-pym.org.uk/?author=4




_______________________________________________
Chicken-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] ditching syntax-case modules for the utf8 egg

Reply via email to