Unicode in Emacs again

Oliver Doepner Fri, 26 Oct 2001 04:06:20 -0700

Hi,
I got personal mail (which was reply to my earlier mail) from eli
zaretskii, one of the maintainers of gnu emacs. the team seems to be in
need of contributors to improve the unicode support.
see below


cheers
oliver

---------- Forwarded message ----------
Date: Fri, 26 Oct 2001 10:39:50 +0200
From: Eli Zaretskii <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Re: Unicode in Emacs

[..]

> > We need people who are prepared to work on implementing the Unicode
> > features in Emacs.  Ideas are welcome, but right now we have more ideas
> > than we can manage.
> 
> Maybe after my exams. But my elisp skills are poor. :-(

Don't worry, there's lots of ways you could contribute.
How's your C?  Some of the work (a large part of it, actually) needs
to be done on the C level.
Even if you can only read the code (in C or Lisp) and suggest how to
modify it to implement various Unicode features, it's still useful, as
others could write the code given your advice.

> But i have done some superficial "research" on m17n, i18n and l10n ...
> see http://www.coli.uni-sb.de/~oldo/PUB/mls-reqspec.html ...

Impressive. We certainly need that kind of overall view of the issues
when discussing and coding Unicode support in Emacs.  Would you like
to be subscribed to the emacs-unicode mailing list?

> > mappings, bidirectional editing, Arabic presentation forms, etc.  Does
> > vim really support those?
> combining characters ... hm ahem ... http://vim.sourceforge.net/whyvim.php

Thanks for the pointer.  However, I cannot see anything there besides
support for UTF-8 and other Unicode encodings.  Nothing about
combining characters, and the Hebrew text in the snapshot is in the
wrong direction (which means bidirectional behavior specified by the
Unicode Technical Report #9 isn't supported).  So I guess Unicode
support in VIM is still very preliminary, although better than
Emacs's.

> Are you on linux-utf8 ??

No.  I don't have enough time to read another mailing list, sorry.

> Who knows the design that has been made for the unicode support?

I attach it below.

> Maybe these people could join the linux-urf8 list and
> advertise theiir concept and hopefully some others might "step in" do
> contribute some code.

I'd prefer that people who want to work on adding Unicode to Emacs
subscribe to emacs-unicode, and that the work be coordinated there.
Please feel free to forward this suggestion to linux-utf8.

> Where is the emacs-unicode mailing list??

It's hosted on gnu.org machines.  I can subscribe anyone who wants to
be part of this effort.

> thank you for all the valuable work you do!

You are welcome.  And thanks for raising this important issue: it
looks like a few people are interested enough that they wrote to me
and offered help.



        Emacs-Unicode-990824
----------------------------------------------------------------------
Internal Character code:

  00 0000 xxxxxxxx xxxxxxxx   Unicode U+0000 - U+FFFF
  00 xxxx xxxxxxxx xxxxxxxx   Unicode 20bit (via surrogate pair)
  01 0000 xxxxxxxx xxxxxxxx   Unicode 20bit (via surrogate pair)
  01 0ppp xxxxxxxx xxxxxxxx   7 64kByte planes reserved for Emacs
  01 1ppp xxxxxxxx xxxxxxxx   8 64kByte planes for private use
  1x xxxx xxxxxxxx xxxxxxxx   for private use, CNS 3-16, and CCCII

        Private area is 180000h - 3087FFh

----------------------------------------------------------------------
Multibyte sequence in buffer/string:

  1 byte: xxxxxxxx
    0xxxxxxx
        ASCII
    1xxxxxxx
        not used

  2 bytes: 110xxxxx 10xxxxxx where x... are:
    00000 000000 - 00001 111111 (0h - 7Fh)
        7 bits not used
        (or we may be able to use this area for holding 8-bit raw data
         in multibyte buffer/string)
    00010 000000 - 11111 111111 (80h - 7FFh)
        Unicode U+0080 - U+07FF

  3 bytes: 1110xxxx 10xxxxxx 10xxxxxx where x... are:
    0000 000000 000000 - 0000 011111 111111 (0h - 7FFh)
        11 bits not used
    0000 100000 000000 - 1111 111111 111111 (800h - FFFFh)
        Unicode U+0800 - U+FFFF

  4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx where x... are:
    000 000000 000000 000000 - 000 001111 111111 111111 (0h - FFFFh)
        16 bits not used
    000 010000 000000 000000 - 100 001111 111111 111111 (10000h - 10FFFFh)
        20 bits Unicode via surrogate pare
    100 010000 000000 000000 - 101 111111 111111 111111 (110000h - 17FFFFh)
        7 64kByte planes reserved for Emacs
        We may map Japanese Han characters here.
    110 000000 000000 000000 - 111 111111 111111 111111 (180000h - 1FFFFFh)
        8 64kByte planes reserved for private use

  5 bytes: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx where x... are:
    00 000000 000000 000000 000000 - 00 000111 111111 111111 111111
                                0h - 1FFFFFh
        21 bits not used
    00 001000 000000 000000 000000 - 00 001100 001000 011111 111111
                           200000h - 3087FFh
        1083391 (almost 1M) character code points for private use
    00 001100 001000 100000 000000 - 00 001100 100111 111111 111111
                           308800h - 327FFFh
        CNS Plain 3 to 16 (96*96*14)
    00 001100 101000 000000 000000 - 00 001111 111111 111111 111111
                           328000h - 3FFFFFFh
        CCCII (96*96*96)

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Unicode in Emacs again

Reply via email to