[Freefont-bugs] Discussion and questions on Unicode Han Unification

Ange Gapes Tue, 25 Jan 2011 22:52:19 -0800

Hello,

sorry this is not directly about bugs in Freefont, nor direct development
matters, but I could not find a more generic ml for your project. But I
think this kind of discussion is still of interest. Hopefully you will think
so.


I recently came to some interest on the Han unification project and problem
they implies for texts mixing languages. As you are a font project, I guess
you know the issues, but for those who don't, I summarize this way:
typically for the main 3 languages (Chinese, Japanese, and Korean, though
these last one don't use them much in modern writing, hence CJK) who use
Chinese-originated characters (Han characters), the Unicode project has
decided to unite the character from a same origin (Han Unification: Unihan).
This leads to problem when the actual writing of them is different depending
on the actual country, sometimes slightly (style), sometimes in a more
obvious way. The Wikipedia page has good examples on the issue:
http://en.wikipedia.org/wiki/Unihan#Examples_of_language_dependent_characters(this
is significant only if you have right fonts on the computers which
will show actually the characters with difference).

The way it is dealt with is:
- you use only one of these languages, then you don't care and take only
fonts which display your chosen language's way.
- if you read texts of several languages, or even mixed inside a same text,
the text can have some kind of markup then different fonts are selected.This
is the way it is done in html, hence you can see different fonts for the
actually same unicode character in the Wikipedia page I showed before.

But what when you read raw text file without markup for instance? No sure
way to tell the language for the editor and mixed characters won't show up.

So why do I tell this all to you? I would like to know your opinion, if not
position, towards this Unicode decision. Do you have any remarks on it?
Also what does it mean for a project like yours? Is it possible in a same
font family to provide several different fonts/design for the same character
with "context" information (= this font is preferably for Chinese display
only, unless no other choice, this one for Japanese, and so on) and a
default one maybe (in case no context is available, use this "generic"
design)? So that a software using your font only may still display different
designs depending on the displayed language (if it knows it) or a default
version otherwise...

On a side note, I read somewhere that there were maybe some other kinds of
characters where similar problems arise. In particular I read on a website
about another example of Arabic characters being used in several
country/languages but displayed slightly differently. Yet after some search,
I could not find actual information on this specific issue, so I don't know
if it is true, or maybe it has been fixed since then by the Unicode project
by assigning specific characters or control characters to change the
display? (Arabic don't have that many characters as those East Asian
languages, hence less space issue for duplicating characters)
Do you know about such specific Arabic-character issue? Or other issues with
other glyphs in other alphabet?
Do you participate into Unicode standardization? Do you have details on what
conducted to this internally? Is it really ONLY a space problem? Because
even though there are for sure a lot of characters in these countries, it
looks to me there are still a lot of slots unassigned, really far enough
(that's how Unicode has been designed after all: with far enough slots for
all history, as far as I know). So I don't see the points of keeping them
for no reason (it's not like suddenly new alphabets of hundred of thousands
of characters, all new, will be created in the next century).
And in the worst case, Unicode may still be extended.
So if you have any particularly interested link to discussion in the Unicode
project (mailing lists maybe?) about how we came to this, this is
interesting as well.
I will also myself ask directly to Unicode guys later, but I first wanted to
know the opinion of a font project whose goal would be to span on all the
Unicode. What does that imply for you?

And so on second level, why do I ask all this? Simply first of all I am
interested in Unicode, in such questions, for personal use but also for pure
intellectual interest (among other reasons, being myself involved in
standardization processes, though not directly into Unicode, for now at
least). Also because I think this is pretty sad and when I read about this,
I didn't agree much with such moves (whereas the prime goal of Unicode was
to support any existing character, so this looks like a step backwards; and
also because we know that some countries, Japan at least for what I know, is
not very into standardization, thus they don't use that much the Unicode
encodings, like UTF-8, but localized encodings, and this kind of move won't
make them want to change this).
And also because I am currently beginning to write what-may-become-a-book,
in some future, not on this in particular, but this kind of topic may be
part of it.
So thanks all. Any opinion and information on the topic would be greatly
appreciated.

Ange

P.S.: and for personal use, a last question: do you plan on supporting these
East-Asian characters in some foreseen future? In particular Japanese
Hiragana-Katakana-Kanjis and Korean basic alphabet?

[Freefont-bugs] Discussion and questions on Unicode Han Unification

Reply via email to