Tomohiro KUBOTA wrote on 2002-07-23 13:48 UTC:
> A good news. I will have to try it...
It's trivial to install.
> Does it support LC_CTYPE ?
$ man perlunicode
[...]
"use utf8" still needed to enable UTF-8/UTF-EBCDIC in
scripts
As a compatibility measure, the "use utf8" pragma must
be explicitly included to enable recognition of UTF-8
in the Perl scripts themselves (in string or regular
expression literals, or in identifier names) on ASCII-
based machines or to recognize UTF-EBCDIC on EBCDIC-
based machines. These are the only times when an
explicit "use utf8" is needed. See utf8.
[...]
� If your locale environment variables (LANGUAGE,
LC_ALL, LC_CTYPE, LANG) contain the strings 'UTF-8' or
'UTF8' (case-insensitive matching), the default encod�
ings of your STDIN, STDOUT, and STDERR, and of any
subsequent file open, are considered to be UTF-8.
[...]
>
> > Another major milestone reached ... I guess the emacs-unicode is now the
> > only one left ...
>
> Linux console's Unicode support is very poor. It can handle only
> a few hundreds of characters, and cannot handle combining nor doublewidth
> characters. It doesn't have API for CJK input methods.
OK, what I meant with emacs-unicode being the only major milestone left
is that then UTF-8 support on Linux will be good enough such that we can
genuinely start recommending UTF-8 as a better way to anyone who uses
ISO 8859-1,2,etc. at the moment. I did not want to imply that all i18n
problems are solved now, but having UTF-8 support good enough for daily
use in European languages is already a very major step forward and will
get two orders of magnitude more Unix/Linux developpers interested in
the topic.
There is an amazing amount of progress around UTF-8 happening these
days. Just two examples:
* I just noticed that http://www.google.com/intl/en/, perhaps the
most popular web site on the planet has switched to UTF-8, which I
guess means that UTF-8 encoded HTML has now really become a
mainstream web technology, not just a geek experiment.
* And http://www.bbc.co.uk/urdu/ is another indication that UTF-8
is now the only reasonable HTML encoding for a significant number
of people.
I wrote ~2 years ago that I expect major increases in the use of UTF-8
in (from then) 2-3 years, and I think the prognosis is still realistic.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/