> Neil Jennings writes:
> |
> | The abc 2.0 draft has the following
> |
> | %%abc-charset iso-8859-1  (or other iso code)
>
> Well, yes, but that doesn't seem to have a well-defined scope.   Does
> it apply to the whole file? If I have a text that's mixed Russian and
> Yiddish (not a hypothetic case), how do I indicate which parts are in
> which character set?

Unicode can encode anything, so if there is any character that can't be
expressed in ASCII, the whole file could be Unicode. Then the switching
isn't a problem.

Perhaps start the file with one of three things:

1) nothing - the file is ASCII
2) the string "ASCII", in the ASCII character set - the file is ASCII
3) the string "UNICODE", in the Unicode character set - the file is Unicode.

These tags can be interspersed, too, so anywhere that a X: tag is legal, so
that one could concatenate files of different types and it would switch on
the fly.

Files of type one couldn't be concatenated to a Unicode file, though, but
one could concatenate another file first that only contained the string
"ASCII".

> Also, there's a potentially very serious gotcha  with  this  sort  of
> charset  indicator:  What  if  I  copy  the  file  to another machine
> (perhaps via a browser, or maybe with a file-copy  program),  and  it
> decides  to  rewrite the file to a native charset on the new machine.
> It will, of course, translate the above %% line to the  new  charset,
> but  it  will still claim that the text is iso-8859-1, and that's now
> wrong.

I've never heard of an OS doing that. They don't automatically translate
between \r\n and \r and \n.

Paul Rosen
--- Life is a musical, every once in a while
      the plot stops and you start singing and dancing ---
http://home.earthlink.net/~paulerosen/brbb/
http://home.earthlink.net/~theplums/


To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to