> Neil Jennings writes: > | > | The abc 2.0 draft has the following > | > | %%abc-charset iso-8859-1 (or other iso code) > > Well, yes, but that doesn't seem to have a well-defined scope. Does > it apply to the whole file? If I have a text that's mixed Russian and > Yiddish (not a hypothetic case), how do I indicate which parts are in > which character set?
Unicode can encode anything, so if there is any character that can't be expressed in ASCII, the whole file could be Unicode. Then the switching isn't a problem. Perhaps start the file with one of three things: 1) nothing - the file is ASCII 2) the string "ASCII", in the ASCII character set - the file is ASCII 3) the string "UNICODE", in the Unicode character set - the file is Unicode. These tags can be interspersed, too, so anywhere that a X: tag is legal, so that one could concatenate files of different types and it would switch on the fly. Files of type one couldn't be concatenated to a Unicode file, though, but one could concatenate another file first that only contained the string "ASCII". > Also, there's a potentially very serious gotcha with this sort of > charset indicator: What if I copy the file to another machine > (perhaps via a browser, or maybe with a file-copy program), and it > decides to rewrite the file to a native charset on the new machine. > It will, of course, translate the above %% line to the new charset, > but it will still claim that the text is iso-8859-1, and that's now > wrong. I've never heard of an OS doing that. They don't automatically translate between \r\n and \r and \n. Paul Rosen --- Life is a musical, every once in a while the plot stops and you start singing and dancing --- http://home.earthlink.net/~paulerosen/brbb/ http://home.earthlink.net/~theplums/ To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html