At 12:45 +0100 97/11/10, Kent Karlsson [EMAIL PROTECTED] wrote:

> As everyone (getting) familiar with Unicode should
> know, Unicode is **NOT** a font encoding.
> It is a CHARACTER encoding.  The difference
> shows up mostly for 'complex scripts', such as Arabic
> and Devanagari (used for Hindi), but also in the processing
> of combining characters for 'latin'.  Glyph (at a "font point")
> selection is based also on *neighbouring* characters.
>
> Unicode does have a number of compatability characters,
> but the explicit intent is that they should only be used
> for backwards compatability reasons.

  I leave it to the experts to figure out what exactly Unicode is. I can
only note that the idea of it as a character encoding thoroughly breaks
down in a mathematical context. I think the safest thing is to only regard
it as a set of glyphs, which are better, because ampler, than other
encodings. I think figuring out the exact involved semantics of those
glyphs is a highly complex issue which cannot fully be resolved.

  There is also a Unicode version of TeX called Omega, which can perhaps
shed light on the font encoding issue for those interested. :-)

>B.t.w. Did you know...  that CR and LF should not be used
>in "newly produced" Unicode texts.  One should use Line
>Separator (U+2028) and Paragraph Separator (U+2029)
>instead.  Line Separator is the one expected to be used
>in program source files.

  Yes. By using the Unicode Line Separator and Paragraph Separator
characters, the file becomes platform independent. So oldstyle newlines
should also probably only be used for compatibility reasons.

  Hans Aberg
                  * Email: Hans Aberg <mailto:[EMAIL PROTECTED]>
                  * AMS member listing: <http://www.ams.org/cml/>




Reply via email to