[Please install Microsoft or DejaVu fonts and use Mozilla Thunderbird,
Evolution or KMail in order to read my reply]
Jim Gifford wrote:
Alex here are my notes come up with comparing what you have done to what
the distro's have done. I'm currently researching this to add utf-8
support to Cross-LFS book..
Many thanks for asking questions!
1 - Ncurses - On locales that are not utf-8 capable, shouldn't you also
build the narrowc libraries also.
Ncurses in the utf-8 book should build narrowc and widec to maintain
compatibiltiy with non utf-8 locales.
wide-character library supports both utf-8 and traditional 8-bit
locales. It stores characters internally in wchar_t that corresponds to
UCS-4, and converts from/to the locale encoding (without caring if it is
utf-8 or something other) on the fly when needed.
2 - Groff - Why not use Groff 1.19.2 without the debian patch. See
http://lists.gnu.org/archive/html/groff/2005-09/msg00004.html,
I assume you mean this:
Troff
-----
o Cyrillic characters have been added to the `utf8' and `html' output
devices.
Grotty
------
o Experimental support for zero-width and double-width characters.
However, this doesn't match the way the way non-ISO-8859-1 manual pages
are written. New groff understands something like \[u0411] (i.e.,
characters \, [, u, 0, 4, 1, 1, ]) as a request to typeset the CYRILLIC
CAPITAL LETTER BE (Б). The current non-RedHat Russian manual pages in
KOI8-R just include one byte with the code 0xe2 (that corresponds to
that letter in the KOI8-R encoding) instead of that strange escape
sequence that nobody uses. In RedHat, they use patched (but very buggy)
groff-1.18.1.1 that understands UTF-8 input. In such case, this would be
two bytes: 0xd0 0x91. Looks like a task for a preprocessor, but see below.
Manual pages are not going to be stored in UTF-8 in LFS. This would
require addition of conversion instructions to every BLFS package that
comes with at least one translated manual page. If you think it's better
to do now, because this conversion will be needed anyway in some future
due to the mix of old-style and new-style packages, I will reconsider
some of my other points.
with a discussion started here about utf-8.
http://lists.gnu.org/archive/html/groff/2005-07/msg00006.html
The link above talks how to use a preprocessor in order to read UTF-8
encoded manual pages in UTF-8 locales. This is NOT what we want. We want
already-existing (non-UTF-8) manual pages to be readable in both
traditional and UTF-8 locales.
3 - Grep Minor patch needed -
http://www.openi18n.org/subgroups/utildev/patch/grep-2.5.1-i18n-0.1.patch.gz
Part of grep-2.5.1a-redhat_fixes-1.patch, with many improvements.
4 - Diffutils Minor patch needed -
http://www.openi18n.org/subgroups/utildev/patch/diffutils-2.8.1-i18n-0.2.patch.gz
Part of diffutils-2.8.1-i18n-1.patch
In fact, a number of bugs were reported against the version of patches
that are hosted on openi18n.org, and RedHat fixed that. That's why
RedHat patches, not the original ones, are used.
5 - Most distro's don't use man-db - Why did you select this, Fedora
uses just man.
Not man, but a severely patched man. Fedora's man doesn't work in
non-UTF-8 locales. The original man has a problem with its own
translated messages. If one compiles it as LFS says, "man foobar"
results in:
Cannot open the message catalog "man" for locale "ru_RU.UTF-8"
(NLSPATH="<none>")
No manual entry for foobar
If one adds -lang en,ru (or all), the result in the UTF-8 locale is:
[a sequence of square boxes] foobar [a sequence of square boxes]
That's because "man" forgets to convert from the translator's charset
(KOI8-R) to the user's one (UTF-8 in this case). The correct result
should be:
Ничего про foobar в руководстве нет
Fedora patched all translated messages in the "man" program in order for
them to show properly in UTF-8 locales, but this broke them for
non-UTF-8 ones.
If you compile man with "+lang none" and explain the reason for it in
the book, I will take this objection back (as English messages will be
always used and no complaints about NLSPATH will be ever seen), but the
following remains:
JNROFF is only satisfied by a groff that has a "nippon" device, that's
only Debian Groff. Maybe the groff-utf8 wrapper can be also modified to
accept EUC-JP, but I don't want to be the first to implement this.
For non-Japanese, the proper setup for the NROFF line would include
three cases:
1) non-UTF-8 locales
2) ISO-8859-1 encoded manual pages, UTF-8 locales
3) manual pages in other 8-bit encoding (e.g. KOI8-R is traditionally
used for Russian manual pages), UTF-8 locales
See the man-i18n hint (the "Hacks" section) for details.
For me, that means too many "if"s in the book (the ones already present
on Man-DB page, such as "if upstream distributes manual pages with
RedHat in mind, please convert", stay).
Man-DB does the Right Thing in all three cases, and outputs translations
of "No manual entry for foobar" properly in all locales, due to its use
of gettext.
BTW, you forgot one more issue:
6 - Nothing is said about mounting filesystems with DOS/Windows origin
(VFAT, NTFS, ISOFS, SMBFS and CIFS) in fstab, namely about the
"iocharset" and "codepage" parameters.
Sorry for that. The example line for ru_RU.UTF-8 would be:
/dev/fd0 /media/floppy vfat noauto,user,iocharset=utf8,codepage=866 0 0
The "iocharset" parameter should be one of the kernel-supported charsets
(see "File Systems" -> "Native Language Support") that most closely
matches the output of "locale charmap". The "codepage" parameter should
reflect the legacy DOS codepage number used in the country.
As an alternative for specifying "iocharset" and "codepage" each time,
it is possible to set the following kernel configuration parameters:
"File Systems" -> "Native Language Support" -> "Default NLS Option" to
the desired iocharset value (that's for ISOFS and SMBFS),
"File Systems" -> "DOS/FAT/NT Filesystems" -> "Default
{iocharset,codepage} for FAT" to the corresponding iocharset and codepage.
WARNING: use of "utf8" iocharset makes filenames on VFAT filesystems
case-sensitive.
--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page