Re: bbdb-print with weird coding systems

Alex Schroeder Mon, 23 Dec 2002 16:14:11 -0800

[EMAIL PROTECTED] (Kai Großjohann) writes:

>> Really? How is this the case? Or more to the point, what's the set of
>> characters in Emacs that can't be represented in Unicode?
>
> In Emacs, Latin-1 ä and Latin-2 ä are two distinct characters.  I
> think in Unicode there is only one ä.
>
> There was much talk about `Han unification'.  I have no idea what
> that is, but I /think/ it means that some Chinese characters are
> identified with some Japanese characters.  So from the the Unicode
> you don't know which character it is.  (But applications might wish
> to use different glyphs depending on whether they're showing Chinese
> or Japanese text.)  Note that this is just hearsay -- it might be
> completely wrong.  In Emacs, the Chinese and the Japanese character
> are considered to be distinct.


This is what happened.  There a glyphs that look very similar in
Japanese and new Chinese and traditional Chinese (yes, several
variants exist), and Unicode says that they are represented as the
same byte sequence -- and that the app should use the "correct" font.
Obviously this requires saving some information somewhere else.  Now
imagine you have a text file with both Japanese and Chinese
characters.  Now you cannot just save the file, and use a font, you
require even more information to render this totally correctly.

What Emacs does, instead, it uses all coding systems, and it uses a
"meta" coding system.  Two such exist, afaik: ISO-2022 and emacs-mule.
Both work by using an escape sequence (ISO-2022 uses longer escape
sequences, emacs-mule uses some special non-ASCII bytes) and
specifying which low-level encoding follows.  Therefore no information
is lost -- but Emacs has to know all other coding systems.

This is why Emacs treats Unicode as just another coding system -- this
works well, since the work for all the other coding systems has
already been done.  And using only Unicode is lossy.  So what Emacs
lacks, at the moment, is only all the Unicode fancy stuff --
compositions, alternate representations, etc.

> I think the new Unicode-based internal encoding in Emacs will offer
> some way around `Han unification', perhaps by using private extension
> areas in Unicode.

Perhaps.  But perhaps this will just mean that Emacs can "decode" all
the fancy Unicode stuff, and the internal emacs-mule coding system is
expanded to handle anything it cannot yet represent...  We would have
to ask Handa or other people on emacs-devel to know for sure.

Alex.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/bbdb-info
BBDB Home Page: http://bbdb.sourceforge.net/

Re: bbdb-print with weird coding systems

Reply via email to