Unicode encodes characters and also define various other things but
not glyphs associated with the code point values for characters. However,
it is quite important to understand and acknowledge that presentation layer of
the computer software should provide what the current user of the software
expects in terms of glyph variations. For instance, especially for
CJK regions, as Tomohiro described, an Unicode Ideograph can be represented in
different glyphs depend on who you are. It is important to acknowledge
such cultural differences and trying to support, for instance, by using
intelligent fonts and/or binding the software to something like locale and
thus glyphs will be presented in the current user's locale as many and much
as possible is quite necessary, well, at least as a starter. Of course, more
elaborate things could be done in future using inbound and outbound taggings
like various markup languages and Plane 14 characters.

Also, as an example, I placed a TIFF file at the following URL
that will display different variations of glyphs for same Unicode code
point values. (I started two dtterm terminal emulators, one with ja_JP.UTF-8
locale and the other with zh_CN.UTF-8 locale and then did 'more /usr/pub/UTF-8'
in both terminal emulators to U+4E00:

        http://ienup.tripod.com/cjk-glyph-variations.tiff
        
Particularly, please note glyphs like U+4E08, U+4E10, U+4E12, U+4E41, U+4E62
U+4EA4, U+4EC8, and so on; they are different. (I only did this for a few
seconds of glancing of a few lines at the terminal emulators and so there are
evem many more different glyph variations among CJK region if you also consider
Traditinal Chinese, Korean, and Vietnamese ideograph variations.)
Also, if you're a non-native speaker, missing strokes here and there, having
wrong directions of strokes in your writing of ideographs could be okay but if
you're a native speaker, well, people wouldn't tolerate that much on such
departings from the norm in my understanding.

With regards,

Ienup


] Date: Sat, 03 Feb 2001 01:26:03 +0000 (GMT)
] From: Robert Brady <[EMAIL PROTECTED]>
] Subject: RE: XTerm and ISO-2022
] X-Sender: [EMAIL PROTECTED]
] To: [EMAIL PROTECTED]
] MIME-version: 1.0
] 
] On Sat, 3 Feb 2001, Tomohiro KUBOTA wrote:
] 
] > In Unicode, CJK characters with same meaning and similar shape is
] > unified.  For example, U+9AA8 (ideograph 'bone') unifies 0x3947 from
] > GB2312 (Mainland China), 0x586C from CNS11643-1 (Taiwan), 0x397C from
] > JISX0208 (Japan), and 0x4D69 from KSX1001 (Korea).  However, though
] > these character share the common origin, today they have different
] > shape and CJK people cannot tolerate.  Note that these all characters
] > are not historic but used for daily use.  Also note that any future
] > extention cannot fix this problem because already determined codepoint
] > of Unicode will not be changed in future.  (And more, if it were
] > changed, confusion will occur.)
] 
] Are these differences any more significant than the differences between
] the following forms of Latin 
] 
]   * Roman
]   * Italic
]   * Fraktur
]   * Black Letter
]   * Handwriting-of-Markus-Kuhn-When-Quite-Drunk (which is registed to be
]     -mk3- I think in my registry of ADD_STYLE_NAME entries).
] 
] Compare also the polish form of acute, which many insist is a different
] accent entirely.
] 
] The answer is a definite "no".
] 
] Unicode encodes characters, not glyphs.  The line has to be drawn
] somewhere. You argument is an old one, and sadly mainly one that is used
] to justify fear of change.
] 
] -- 
] Robert Brady
] [EMAIL PROTECTED]
] 
] -
] Linux-UTF8:   i18n of Linux on all levels
] Archive:      http://mail.nl.linux.org/lists/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to