Hi, I got personal mail (which was reply to my earlier mail) from eli zaretskii, one of the maintainers of gnu emacs. the team seems to be in need of contributors to improve the unicode support. see below
cheers oliver ---------- Forwarded message ---------- Date: Fri, 26 Oct 2001 10:39:50 +0200 From: Eli Zaretskii <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Re: Unicode in Emacs [..] > > We need people who are prepared to work on implementing the Unicode > > features in Emacs. Ideas are welcome, but right now we have more ideas > > than we can manage. > > Maybe after my exams. But my elisp skills are poor. :-( Don't worry, there's lots of ways you could contribute. How's your C? Some of the work (a large part of it, actually) needs to be done on the C level. Even if you can only read the code (in C or Lisp) and suggest how to modify it to implement various Unicode features, it's still useful, as others could write the code given your advice. > But i have done some superficial "research" on m17n, i18n and l10n ... > see http://www.coli.uni-sb.de/~oldo/PUB/mls-reqspec.html ... Impressive. We certainly need that kind of overall view of the issues when discussing and coding Unicode support in Emacs. Would you like to be subscribed to the emacs-unicode mailing list? > > mappings, bidirectional editing, Arabic presentation forms, etc. Does > > vim really support those? > combining characters ... hm ahem ... http://vim.sourceforge.net/whyvim.php Thanks for the pointer. However, I cannot see anything there besides support for UTF-8 and other Unicode encodings. Nothing about combining characters, and the Hebrew text in the snapshot is in the wrong direction (which means bidirectional behavior specified by the Unicode Technical Report #9 isn't supported). So I guess Unicode support in VIM is still very preliminary, although better than Emacs's. > Are you on linux-utf8 ?? No. I don't have enough time to read another mailing list, sorry. > Who knows the design that has been made for the unicode support? I attach it below. > Maybe these people could join the linux-urf8 list and > advertise theiir concept and hopefully some others might "step in" do > contribute some code. I'd prefer that people who want to work on adding Unicode to Emacs subscribe to emacs-unicode, and that the work be coordinated there. Please feel free to forward this suggestion to linux-utf8. > Where is the emacs-unicode mailing list?? It's hosted on gnu.org machines. I can subscribe anyone who wants to be part of this effort. > thank you for all the valuable work you do! You are welcome. And thanks for raising this important issue: it looks like a few people are interested enough that they wrote to me and offered help. Emacs-Unicode-990824 ---------------------------------------------------------------------- Internal Character code: 00 0000 xxxxxxxx xxxxxxxx Unicode U+0000 - U+FFFF 00 xxxx xxxxxxxx xxxxxxxx Unicode 20bit (via surrogate pair) 01 0000 xxxxxxxx xxxxxxxx Unicode 20bit (via surrogate pair) 01 0ppp xxxxxxxx xxxxxxxx 7 64kByte planes reserved for Emacs 01 1ppp xxxxxxxx xxxxxxxx 8 64kByte planes for private use 1x xxxx xxxxxxxx xxxxxxxx for private use, CNS 3-16, and CCCII Private area is 180000h - 3087FFh ---------------------------------------------------------------------- Multibyte sequence in buffer/string: 1 byte: xxxxxxxx 0xxxxxxx ASCII 1xxxxxxx not used 2 bytes: 110xxxxx 10xxxxxx where x... are: 00000 000000 - 00001 111111 (0h - 7Fh) 7 bits not used (or we may be able to use this area for holding 8-bit raw data in multibyte buffer/string) 00010 000000 - 11111 111111 (80h - 7FFh) Unicode U+0080 - U+07FF 3 bytes: 1110xxxx 10xxxxxx 10xxxxxx where x... are: 0000 000000 000000 - 0000 011111 111111 (0h - 7FFh) 11 bits not used 0000 100000 000000 - 1111 111111 111111 (800h - FFFFh) Unicode U+0800 - U+FFFF 4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx where x... are: 000 000000 000000 000000 - 000 001111 111111 111111 (0h - FFFFh) 16 bits not used 000 010000 000000 000000 - 100 001111 111111 111111 (10000h - 10FFFFh) 20 bits Unicode via surrogate pare 100 010000 000000 000000 - 101 111111 111111 111111 (110000h - 17FFFFh) 7 64kByte planes reserved for Emacs We may map Japanese Han characters here. 110 000000 000000 000000 - 111 111111 111111 111111 (180000h - 1FFFFFh) 8 64kByte planes reserved for private use 5 bytes: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx where x... are: 00 000000 000000 000000 000000 - 00 000111 111111 111111 111111 0h - 1FFFFFh 21 bits not used 00 001000 000000 000000 000000 - 00 001100 001000 011111 111111 200000h - 3087FFh 1083391 (almost 1M) character code points for private use 00 001100 001000 100000 000000 - 00 001100 100111 111111 111111 308800h - 327FFFh CNS Plain 3 to 16 (96*96*14) 00 001100 101000 000000 000000 - 00 001111 111111 111111 111111 328000h - 3FFFFFFh CCCII (96*96*96) - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/