[Please install Microsoft or DejaVu fonts and use Mozilla Thunderbird, Evolution or KMail in order to read my reply]

Jim Gifford wrote:
Alex here are my notes come up with comparing what you have done to what the distro's have done. I'm currently researching this to add utf-8 support to Cross-LFS book..

Many thanks for asking questions!

1 - Ncurses - On locales that are not utf-8 capable, shouldn't you also build the narrowc libraries also. Ncurses in the utf-8 book should build narrowc and widec to maintain compatibiltiy with non utf-8 locales.

wide-character library supports both utf-8 and traditional 8-bit locales. It stores characters internally in wchar_t that corresponds to UCS-4, and converts from/to the locale encoding (without caring if it is utf-8 or something other) on the fly when needed.

2 - Groff - Why not use Groff 1.19.2 without the debian patch. See http://lists.gnu.org/archive/html/groff/2005-09/msg00004.html,

I assume you mean this:

Troff
-----
o Cyrillic characters have been added to the `utf8' and `html' output
  devices.

Grotty
------

o Experimental support for zero-width and double-width characters.

However, this doesn't match the way the way non-ISO-8859-1 manual pages are written. New groff understands something like \[u0411] (i.e., characters \, [, u, 0, 4, 1, 1, ]) as a request to typeset the CYRILLIC CAPITAL LETTER BE (Б). The current non-RedHat Russian manual pages in KOI8-R just include one byte with the code 0xe2 (that corresponds to that letter in the KOI8-R encoding) instead of that strange escape sequence that nobody uses. In RedHat, they use patched (but very buggy) groff-1.18.1.1 that understands UTF-8 input. In such case, this would be two bytes: 0xd0 0x91. Looks like a task for a preprocessor, but see below.

Manual pages are not going to be stored in UTF-8 in LFS. This would require addition of conversion instructions to every BLFS package that comes with at least one translated manual page. If you think it's better to do now, because this conversion will be needed anyway in some future due to the mix of old-style and new-style packages, I will reconsider some of my other points.

with a discussion started here about utf-8. http://lists.gnu.org/archive/html/groff/2005-07/msg00006.html

The link above talks how to use a preprocessor in order to read UTF-8 encoded manual pages in UTF-8 locales. This is NOT what we want. We want already-existing (non-UTF-8) manual pages to be readable in both traditional and UTF-8 locales.

3 - Grep Minor patch needed - http://www.openi18n.org/subgroups/utildev/patch/grep-2.5.1-i18n-0.1.patch.gz

Part of grep-2.5.1a-redhat_fixes-1.patch, with many improvements.

4 - Diffutils Minor patch needed - http://www.openi18n.org/subgroups/utildev/patch/diffutils-2.8.1-i18n-0.2.patch.gz

Part of diffutils-2.8.1-i18n-1.patch

In fact, a number of bugs were reported against the version of patches that are hosted on openi18n.org, and RedHat fixed that. That's why RedHat patches, not the original ones, are used.

5 - Most distro's don't use man-db - Why did you select this, Fedora uses just man.

Not man, but a severely patched man. Fedora's man doesn't work in non-UTF-8 locales. The original man has a problem with its own translated messages. If one compiles it as LFS says, "man foobar" results in:

Cannot open the message catalog "man" for locale "ru_RU.UTF-8"
(NLSPATH="<none>")

No manual entry for foobar

If one adds -lang en,ru (or all), the result in the UTF-8 locale is:

[a sequence of square boxes] foobar [a sequence of square boxes]

That's because "man" forgets to convert from the translator's charset (KOI8-R) to the user's one (UTF-8 in this case). The correct result should be:

Ничего про foobar в руководстве нет

Fedora patched all translated messages in the "man" program in order for them to show properly in UTF-8 locales, but this broke them for non-UTF-8 ones.

If you compile man with "+lang none" and explain the reason for it in the book, I will take this objection back (as English messages will be always used and no complaints about NLSPATH will be ever seen), but the following remains:

JNROFF is only satisfied by a groff that has a "nippon" device, that's only Debian Groff. Maybe the groff-utf8 wrapper can be also modified to accept EUC-JP, but I don't want to be the first to implement this.

For non-Japanese, the proper setup for the NROFF line would include three cases:

1) non-UTF-8 locales
2) ISO-8859-1 encoded manual pages, UTF-8 locales
3) manual pages in other 8-bit encoding (e.g. KOI8-R is traditionally used for Russian manual pages), UTF-8 locales

See the man-i18n hint (the "Hacks" section) for details.

For me, that means too many "if"s in the book (the ones already present on Man-DB page, such as "if upstream distributes manual pages with RedHat in mind, please convert", stay).

Man-DB does the Right Thing in all three cases, and outputs translations of "No manual entry for foobar" properly in all locales, due to its use of gettext.

BTW, you forgot one more issue:

6 - Nothing is said about mounting filesystems with DOS/Windows origin (VFAT, NTFS, ISOFS, SMBFS and CIFS) in fstab, namely about the "iocharset" and "codepage" parameters.

Sorry for that. The example line for ru_RU.UTF-8 would be:

/dev/fd0 /media/floppy vfat noauto,user,iocharset=utf8,codepage=866 0 0

The "iocharset" parameter should be one of the kernel-supported charsets (see "File Systems" -> "Native Language Support") that most closely matches the output of "locale charmap". The "codepage" parameter should reflect the legacy DOS codepage number used in the country.

As an alternative for specifying "iocharset" and "codepage" each time, it is possible to set the following kernel configuration parameters:

"File Systems" -> "Native Language Support" -> "Default NLS Option" to the desired iocharset value (that's for ISOFS and SMBFS),

"File Systems" -> "DOS/FAT/NT Filesystems" -> "Default {iocharset,codepage} for FAT" to the corresponding iocharset and codepage.

WARNING: use of "utf8" iocharset makes filenames on VFAT filesystems case-sensitive.

--
Alexander E. Patrakov
--
http://linuxfromscratch.org/mailman/listinfo/lfs-dev
FAQ: http://www.linuxfromscratch.org/faq/
Unsubscribe: See the above information page

Reply via email to