Re: Perl 5.8 with significantly improved UTF-8 support is out

Markus Kuhn Tue, 23 Jul 2002 08:55:01 -0700

Tomohiro KUBOTA wrote on 2002-07-23 13:48 UTC:
> A good news.  I will have to try it...


It's trivial to install.

> Does it support LC_CTYPE ?

$ man perlunicode
[...]
       "use utf8" still needed to enable UTF-8/UTF-EBCDIC in
       scripts
           As a compatibility measure, the "use utf8" pragma must
           be explicitly included to enable recognition of UTF-8
           in the Perl scripts themselves (in string or regular
           expression literals, or in identifier names) on ASCII-
           based machines or to recognize UTF-EBCDIC on EBCDIC-
           based machines.  These are the only times when an
           explicit "use utf8" is needed.  See utf8.
[...]
       �   If your locale environment variables (LANGUAGE,
           LC_ALL, LC_CTYPE, LANG) contain the strings 'UTF-8' or
           'UTF8' (case-insensitive matching), the default encod�
           ings of your STDIN, STDOUT, and STDERR, and of any
           subsequent file open, are considered to be UTF-8.
[...]

> 
> > Another major milestone reached ... I guess the emacs-unicode is now the
> > only one left ...
> 
> Linux console's Unicode support is very poor.  It can handle only
> a few hundreds of characters, and cannot handle combining nor doublewidth
> characters.  It doesn't have API for CJK input methods.

OK, what I meant with emacs-unicode being the only major milestone left
is that then UTF-8 support on Linux will be good enough such that we can
genuinely start recommending UTF-8 as a better way to anyone who uses
ISO 8859-1,2,etc. at the moment. I did not want to imply that all i18n
problems are solved now, but having UTF-8 support good enough for daily
use in European languages is already a very major step forward and will
get two orders of magnitude more Unix/Linux developpers interested in
the topic.

There is an amazing amount of progress around UTF-8 happening these
days. Just two examples:

  * I just noticed that http://www.google.com/intl/en/, perhaps the
    most popular web site on the planet has switched to UTF-8, which I
    guess means that UTF-8 encoded HTML has now really become a
    mainstream web technology, not just a geek experiment.

  * And http://www.bbc.co.uk/urdu/ is another indication that UTF-8
    is now the only reasonable HTML encoding for a significant number
    of people.

I wrote ~2 years ago that I expect major increases in the use of UTF-8
in (from then) 2-3 years, and I think the prognosis is still realistic.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Perl 5.8 with significantly improved UTF-8 support is out

Reply via email to