bennyc 05/04/24 12:18:59 Modified: xml/htdocs/doc/en utf-8.xml Log: restoring to rev 1.9. Please ignore rev 1.10!
Revision Changes Path 1.11 +63 -95 xml/htdocs/doc/en/utf-8.xml file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/utf-8.xml?rev=1.11&content-type=text/x-cvsweb-markup&cvsroot=gentoo plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/utf-8.xml?rev=1.11&content-type=text/plain&cvsroot=gentoo diff : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/utf-8.xml.diff?r1=1.10&r2=1.11&cvsroot=gentoo Index: utf-8.xml =================================================================== RCS file: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v retrieving revision 1.10 retrieving revision 1.11 diff -u -r1.10 -r1.11 --- utf-8.xml 24 Apr 2005 03:25:46 -0000 1.10 +++ utf-8.xml 24 Apr 2005 12:18:59 -0000 1.11 @@ -1,5 +1,5 @@ <?xml version='1.0' encoding="UTF-8"?> -<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.10 2005/04/24 03:25:46 bennyc Exp $ --> +<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.11 2005/04/24 12:18:59 bennyc Exp $ --> <!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> <guide link="/doc/en/utf-8.xml"> @@ -20,8 +20,8 @@ <license /> -<version>1.5</version> -<date>2005-04-23</date> +<version>1.8</version> +<date>2005-04-05</date> <chapter> <title>Character Encodings</title> @@ -108,12 +108,11 @@ <body> <p> -Unicode throws away the traditional single-byte limit of character sets, and -even with two bytes per-character this allows a maximum 65,536 characters. -Although this number is extremely high when compared to seven-bit and eight-bit -encodings, it is still not enough for a character set designed to be used for -symbols and scripts used only by scholars, and symbols that are only used in -mathematics and other specialised fields. +Unicode throws away the traditional single-byte limit of character sets. It +uses 17 "planes" of 65,536 code points to describe a maximum of 1,114,112 +characters. As the first plane, aka. "Basic Multilingual Plane" or BMP, +contains almost everything you will ever use, many have made the wrong +assumption that Unicode was a 16-bit character set. </p> <p> @@ -150,7 +149,7 @@ <p> UTF-8 allows you to work in a standards-compliant and internationally accepted -multilingual environment, with a comparitively low data redundancy. UTF-8 is +multilingual environment, with a comparatively low data redundancy. UTF-8 is the preferred way for transmitting non-ASCII characters over the Internet, through Email, IRC or almost any other medium. Despite this, many people regard UTF-8 in online communication as abusive. It is always best to be aware of the @@ -212,6 +211,16 @@ # <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i> </pre> +<p> +Another way to include a UTF-8 locale is to add it to the +<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the +<c>userlocales</c> USE flag set. +</p> + +<pre caption="Line in /etc/locales.build"> +en_GB.UTF-8/UTF-8 +</pre> + </body> </section> <section> @@ -219,67 +228,32 @@ <body> <p> -There are two environment variables that need to be set in order to use -our new UTF-8 locales: <c>LANG</c> and <c>LC_ALL</c>. There are also -many different ways to set them; some people prefer to only have a UTF-8 -environment for a specific user, in which case they set them in their -<path>~/.profile</path> or <path>~/.bashrc</path>. Others prefer to set the -locale globally. One specific circumstance where the author particularly -recommends doing this is when <path>/etc/init.d/xdm</path> is in use, because -this init script starts the display manager and desktop before any of the -aforementioned shell startup files are sourced, and so before any of the -variables are in the environment. +Although by now you might be determined to use UTF-8 system wide, the author +does not recommend setting UTF-8 for the root user. Instead, it is best to set +the locale in your user's <path>~/.profile</path> (or, if you are using a C +shell, <path>~/.login</path>). </p> -<p> -Setting the locale globally should be done using -<path>/etc/env.d/02local</path>. The file should look something like the -following: -</p> - -<pre caption="Demonstration /etc/env.d/02locale"> -<comment>(As always, change "en_GB.UTF-8" to your locale)</comment> -LC_ALL="en_GB.UTF-8" -LOCALE="en_GB.UTF-8" -</pre> - -<p> -Next, the environment must be updated with the change. -</p> +<note> +If you are not sure which file to use, use <path>~/.profile</path>. Also, if +you are unsure which code listing to use, use the Bourne version. +</note> -<pre caption="Updating the environment"> -# <i>env-update</i> ->>> Regenerating /etc/ld.so.cache... - * Caching service dependencies ... - # <i>source /etc/profile</i> +<pre caption="Setting the locale with environment variables (Bourne version)"> +export LANG="en_GB.utf8" +export LC_ALL="en_GB.utf8" </pre> -<p> -Now, run <c>locale</c> with no arguments to see if we have the correct -variables in our environment: -</p> - -<pre caption="Checking if our new locale is in the environment"> -# <i>locale</i> -LANG=en_GB.UTF-8 -LC_CTYPE="en_GB.UTF-8" -LC_NUMERIC="en_GB.UTF-8" -LC_TIME="en_GB.UTF-8" -LC_COLLATE="en_GB.UTF-8" -LC_MONETARY="en_GB.UTF-8" -LC_MESSAGES="en_GB.UTF-8" -LC_PAPER="en_GB.UTF-8" -LC_NAME="en_GB.UTF-8" -LC_ADDRESS="en_GB.UTF-8" -LC_TELEPHONE="en_GB.UTF-8" -LC_MEASUREMENT="en_GB.UTF-8" -LC_IDENTIFICATION="en_GB.UTF-8" -LC_ALL=en_GB.UTF-8 +<pre caption="Setting the locale with environment variables (C shell version)"> +setenv LANG "en_GB.utf8" +setenv LC_ALL "en_GB.utf8" </pre> <p> -That is all. You are now using UTF-8 locales, and the next hurdle is the -configuration of the applications you use from day to day. +Now, logout and back in to apply the change. We want these environment +variables in our entire environment, so it is best to logout and back in, or at +the very least to source <path>~/.profile</path> or <path>~/.login</path> in +the console from which you have started other processes. </p> </body> @@ -376,7 +350,7 @@ <pre caption="Example /etc/conf.d/keymaps snippet"> <comment>(Change "uk" to your local layout)</comment> -KEYMAP="uk" +KEYMAP="-u uk" </pre> </body> @@ -403,7 +377,8 @@ <p> We also need to rebuild packages that link to these, now the USE changes have -been applied. +been applied. The tool we use (<c>revdep-rebuild</c>) is part of the +<c>gentoolkit</c> package. </p> <pre caption="Rebuilding of programs that link to ncurses or slang"> @@ -457,6 +432,11 @@ <title>X11 and Fonts</title> <body> +<impo> +<c>x11-base/xorg-x11</c> has far better support for Unicode than XFree86 +and is <e>highly</e> recommended. +</impo> + <p> TrueType fonts have support for Unicode, and most of the fonts that ship with Xorg have impressive character support, although, obviously, not every single @@ -481,10 +461,10 @@ <body> <p> -Window managers, even those not built on GTK or Qt, generally have very -good Unicode support, as they often use the Xft library for handling -fonts. If your window manager does not use Xft for fonts, you can still -use the FontSpec mentioned in the previous section as a Unicode font. +Window managers not built on GTK or Qt generally have very good Unicode +support, as they often use the Xft library for handling fonts. If your window +manager does not use Xft for fonts, you can still use the FontSpec mentioned in +the previous section as a Unicode font. <<Truncated>> -- [email protected] mailing list
