Shawn Rutledge scripsit: > But you would want the usual string operations to work with either > kind of string, right?
Indeed. > It could follow from the general principle of separating metadata from > data: Put the encoding in the extended attributes of the file, or > resource fork if you've got one. Specifically, the 8-BOM interferes with the ability of ASCII-aware but 8-bit clean programs to treat UTF-8 the same as ASCII. When they expect to see something specific (like #!) at the beginning, they see the 8-BOM instead and barf. I'm all in favor of the 16-BOM, where there are no such issues, and it also serves to reliably flag UTF-16/UCS-2 and to allow for variable endianism. Same with the 32-BOM, if anyone bothers to use UTF-32 for interchange. > I thought it was still a reasonable assumption most of the time, Except when it isn't. ASCII is a reasonable assumption most of the time, except when it isn't. > Or have 4 types of strings: byte (restricted strings), UTF-8, and > fixed-char-size 16- and 24-bit strings. Check out http://larceny.ccs.neu.edu/larceny-trac/wiki/StringRepresentations , then let's talk, if there's anything left to talk about. :-) -- We are lost, lost. No name, no business, no Precious, nothing. Only empty. Only hungry: yes, we are hungry. A few little fishes, nassty bony little fishes, for a poor creature, and they say death. So wise they are; so just, so very just. --Gollum [EMAIL PROTECTED] http://ccil.org/~cowan _______________________________________________ Chicken-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/chicken-users
