On 18 Mar 2008, at 2:29 am, Alex Shinn wrote:
The problems we're having aren't even about string representation though, they're about the semantics of the string operations themselves. Are the string indices byte positions or character positions? Different libraries disagree.
IMHO Java does it more or less right (falls down on the details, though; tends to assume that one UTF16 code = 1 character, sigh). As in, you have a byte type, and a char type, and never the twain shall meet, except that String (a wrapper around a char array with stringy operations defined) has an encode method that takes an encoding name and returns a byte array, and a constructor that takes a byte array and an encoding name. There's versions, too, that don't take an encoding name, and then use the "platform default encoding" (eg, on UNIX, it looks up the locale and works from that). So when you read from a file, you get bytes, but if you ask, they'll be converted to characters, etc. ABS -- Alaric Snell-Pym Work: http://www.snell-systems.co.uk/ Play: http://www.snell-pym.org.uk/alaric/ Blog: http://www.snell-pym.org.uk/?author=4 _______________________________________________ Chicken-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/chicken-users
