bennyc 05/04/24 03:25:46 Modified: xml/htdocs/doc/en utf-8.xml Log: bug 90144
Revision Changes Path 1.10 +95 -63 xml/htdocs/doc/en/utf-8.xml file : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/utf-8.xml?rev=1.10&content-type=text/x-cvsweb-markup&cvsroot=gentoo plain: http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/utf-8.xml?rev=1.10&content-type=text/plain&cvsroot=gentoo diff : http://www.gentoo.org/cgi-bin/viewcvs.cgi/xml/htdocs/doc/en/utf-8.xml.diff?r1=1.9&r2=1.10&cvsroot=gentoo Index: utf-8.xml =================================================================== RCS file: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v retrieving revision 1.9 retrieving revision 1.10 diff -u -r1.9 -r1.10 --- utf-8.xml 5 Apr 2005 08:59:28 -0000 1.9 +++ utf-8.xml 24 Apr 2005 03:25:46 -0000 1.10 @@ -1,5 +1,5 @@ <?xml version='1.0' encoding="UTF-8"?> -<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.9 2005/04/05 08:59:28 neysx Exp $ --> +<!-- $Header: /var/cvsroot/gentoo/xml/htdocs/doc/en/utf-8.xml,v 1.10 2005/04/24 03:25:46 bennyc Exp $ --> <!DOCTYPE guide SYSTEM "/dtd/guide.dtd"> <guide link="/doc/en/utf-8.xml"> @@ -20,8 +20,8 @@ <license /> -<version>1.8</version> -<date>2005-04-05</date> +<version>1.5</version> +<date>2005-04-23</date> <chapter> <title>Character Encodings</title> @@ -108,11 +108,12 @@ <body> <p> -Unicode throws away the traditional single-byte limit of character sets. It -uses 17 "planes" of 65,536 code points to describe a maximum of 1,114,112 -characters. As the first plane, aka. "Basic Multilingual Plane" or BMP, -contains almost everything you will ever use, many have made the wrong -assumption that Unicode was a 16-bit character set. +Unicode throws away the traditional single-byte limit of character sets, and +even with two bytes per-character this allows a maximum 65,536 characters. +Although this number is extremely high when compared to seven-bit and eight-bit +encodings, it is still not enough for a character set designed to be used for +symbols and scripts used only by scholars, and symbols that are only used in +mathematics and other specialised fields. </p> <p> @@ -149,7 +150,7 @@ <p> UTF-8 allows you to work in a standards-compliant and internationally accepted -multilingual environment, with a comparatively low data redundancy. UTF-8 is +multilingual environment, with a comparitively low data redundancy. UTF-8 is the preferred way for transmitting non-ASCII characters over the Internet, through Email, IRC or almost any other medium. Despite this, many people regard UTF-8 in online communication as abusive. It is always best to be aware of the @@ -211,16 +212,6 @@ # <i>localedef -i en_GB -f UTF-8 en_GB.utf8</i> </pre> -<p> -Another way to include a UTF-8 locale is to add it to the -<path>/etc/locales.build</path> file and rebuild <c>glibc</c> with the -<c>userlocales</c> USE flag set. -</p> - -<pre caption="Line in /etc/locales.build"> -en_GB.UTF-8/UTF-8 -</pre> - </body> </section> <section> @@ -228,32 +219,67 @@ <body> <p> -Although by now you might be determined to use UTF-8 system wide, the author -does not recommend setting UTF-8 for the root user. Instead, it is best to set -the locale in your user's <path>~/.profile</path> (or, if you are using a C -shell, <path>~/.login</path>). +There are two environment variables that need to be set in order to use +our new UTF-8 locales: <c>LANG</c> and <c>LC_ALL</c>. There are also +many different ways to set them; some people prefer to only have a UTF-8 +environment for a specific user, in which case they set them in their +<path>~/.profile</path> or <path>~/.bashrc</path>. Others prefer to set the +locale globally. One specific circumstance where the author particularly +recommends doing this is when <path>/etc/init.d/xdm</path> is in use, because +this init script starts the display manager and desktop before any of the +aforementioned shell startup files are sourced, and so before any of the +variables are in the environment. </p> -<note> -If you are not sure which file to use, use <path>~/.profile</path>. Also, if -you are unsure which code listing to use, use the Bourne version. -</note> +<p> +Setting the locale globally should be done using +<path>/etc/env.d/02local</path>. The file should look something like the +following: +</p> -<pre caption="Setting the locale with environment variables (Bourne version)"> -export LANG="en_GB.utf8" -export LC_ALL="en_GB.utf8" +<pre caption="Demonstration /etc/env.d/02locale"> +<comment>(As always, change "en_GB.UTF-8" to your locale)</comment> +LC_ALL="en_GB.UTF-8" +LOCALE="en_GB.UTF-8" </pre> -<pre caption="Setting the locale with environment variables (C shell version)"> -setenv LANG "en_GB.utf8" -setenv LC_ALL "en_GB.utf8" +<p> +Next, the environment must be updated with the change. +</p> + +<pre caption="Updating the environment"> +# <i>env-update</i> +>>> Regenerating /etc/ld.so.cache... + * Caching service dependencies ... + # <i>source /etc/profile</i> </pre> <p> -Now, logout and back in to apply the change. We want these environment -variables in our entire environment, so it is best to logout and back in, or at -the very least to source <path>~/.profile</path> or <path>~/.login</path> in -the console from which you have started other processes. +Now, run <c>locale</c> with no arguments to see if we have the correct +variables in our environment: +</p> + +<pre caption="Checking if our new locale is in the environment"> +# <i>locale</i> +LANG=en_GB.UTF-8 +LC_CTYPE="en_GB.UTF-8" +LC_NUMERIC="en_GB.UTF-8" +LC_TIME="en_GB.UTF-8" +LC_COLLATE="en_GB.UTF-8" +LC_MONETARY="en_GB.UTF-8" +LC_MESSAGES="en_GB.UTF-8" +LC_PAPER="en_GB.UTF-8" +LC_NAME="en_GB.UTF-8" +LC_ADDRESS="en_GB.UTF-8" +LC_TELEPHONE="en_GB.UTF-8" +LC_MEASUREMENT="en_GB.UTF-8" +LC_IDENTIFICATION="en_GB.UTF-8" +LC_ALL=en_GB.UTF-8 +</pre> + +<p> +That is all. You are now using UTF-8 locales, and the next hurdle is the +configuration of the applications you use from day to day. </p> </body> @@ -350,7 +376,7 @@ <pre caption="Example /etc/conf.d/keymaps snippet"> <comment>(Change "uk" to your local layout)</comment> -KEYMAP="-u uk" +KEYMAP="uk" </pre> </body> @@ -377,8 +403,7 @@ <p> We also need to rebuild packages that link to these, now the USE changes have -been applied. The tool we use (<c>revdep-rebuild</c>) is part of the -<c>gentoolkit</c> package. +been applied. </p> <pre caption="Rebuilding of programs that link to ncurses or slang"> @@ -432,11 +457,6 @@ <title>X11 and Fonts</title> <body> -<impo> -<c>x11-base/xorg-x11</c> has far better support for Unicode than XFree86 -and is <e>highly</e> recommended. -</impo> - <p> TrueType fonts have support for Unicode, and most of the fonts that ship with Xorg have impressive character support, although, obviously, not every single @@ -461,10 +481,10 @@ <body> <p> -Window managers not built on GTK or Qt generally have very good Unicode -support, as they often use the Xft library for handling fonts. If your window -manager does not use Xft for fonts, you can still use the FontSpec mentioned in -the previous section as a Unicode font. +Window managers, even those not built on GTK or Qt, generally have very +good Unicode support, as they often use the Xft library for handling +fonts. If your window manager does not use Xft for fonts, you can still +use the FontSpec mentioned in the previous section as a Unicode font. <<Truncated>> -- [email protected] mailing list
