Hi,
At Fri, 02 Mar 2001 09:56:51 +0000,
Markus Kuhn <[EMAIL PROTECTED]> wrote:
> a) add a character set tagging mechanism
>
> b) simply agree that man pages should only be in ASCII or UTF-8
>
> I think that b) is both feasible and simpler. Reasons:
>
> - Non-English man pages usually come as a single big package and the
> documentation says what encoding is used for the entire man package. It
> is nearly trivial for distribution makers to simply send all of that
> through iconv before putting it into their man RPMs.
This is simply against the fact.
It is true we have collection packages of translated manpages. (For
example, Debian 2.2 has German, Spanish, Finnish, French, Hungarian,
Italian, Japanese, Korean, and Polish collection of manpages.) I think
you are saying about such collections. However, there are also many
softwares which includes non-English manpages written by non-English-
speaking member of developers' group for such softwares and so on,
just like each package has '.po' files. Thus, it will be a large
amount of labor to convert them.
> - Practically all downloadable applications that users might want to
> download and install themselves instead of from the distribution are
> written in English and use only ASCII. I can count the counter
> examples with the fingers on a single hand.
It is because you don't know every softwares in the world.
(Do you know the number of open-source softwares in the world?
Of course I don't know. SourceForge has 16558 projects now but
it is obvious they are only a part of open source softwares in
the world.)
At least I wrote three Japanese manpages which are included in
corresponding software packages and not included in such "Japanese
manpages collection". How many fingers do your single hand has?
> - Man page maintainers do not need to use a UTF-8 editor. They can
> keep things in their traditional encoding and just add to their
> Makefiles an option to apply iconv at installation time.
You have not explained why your opinion is better though your opinion
needs such "iconv". My opinion doesn't need such "iconv".
> - On Linux distributions, there is usually only one single application
> (groff) reading man pages, and there are only very few applications
> (man, xman, etc.) calling groff to do that.
Yes, thus, implementing a mechanism to read encoding tag for manpage
reader software is easy. Such a mechanism should have a good default
so that we can replace groff without necessity of modification of any
manpages. My opinion can do this well.
I don't insist that all manpages must have encoding tag. I feel you
misunderstand my opinion on this point.
In short, because there are many manpages while only few softwares to
read them, it is easier to modify softwares than manpages. And more,
manpages have to be written in _one of_ existing encodings. On the
other hand, groff can support _multiple_ encodings. There are no
reason to restrict it.
I don't understand what is the merit of your opinion. Your way need
- conversion of collection manpages
- re-education of manpage writers all over the world
- sudden (not gradual) migration to UTF-8 because your opinion
doesn't support manpages written in EUC-JP, ISO8859-1, ISO8859-2,
KOI8-R, or so on
while all my opinion needs is rewriting groff (of course your opinion
also needs this). On the other hands, my opinion
- supports all existing manpages without conversion
- doesn't need impossible re-education of manpage writers
- supports any encodings for manpage writers (including UTF-8)
It is obvious my opinion is better and can be accepted by real users
all over the world.
Your opinion is just a plot or at least a radicalism to kill
non-UTF-8 encodings right now.
> -Tplaintext Plain text (charset according to locale)
> -Tsgrtext Plain text with added ISO 6429 SGR (ESC [ ... m) emphasis
> (charset according to locale)
> -Tbstext Plain text with added backspace emphasis
> (bold and underline only, charset according to locale)
I am not interested in 'bs' emphasis mechanism. I will agree with
you if your mechanism works without harming other parts.
> Transliteration is indeed a problem.
I think transliteration should not be done _after_ typesetting.
It is impossible. On the other hand, different macro set should
be supplied for different encodings. Since Werner and I agreed
to sweep out every encoding-related difficulties to pre/postprocessor,
the preprocessor must check the encoding for output and inform
troff the proper macro set for the encoding.
> > For manpage writers: I think non-English pages may use non-ASCII
> > characters which native speakers can accept. However, English manpages
> > should be written within ASCII characters, not in ISO-8859-1. This
> > is because English manpages are for all people over the world, while
> > non-English ones are for native speakers.
>
> I hope you don't want to suggest this as the permanent situation for the
> long-term future.
I agree this is not a permanent solution. However, you agree that
non-ASCII characters may be lost or displayed wrongly so far, don't you?
I think non-ASCII characters should not be used for important
description in manpages for many years. This is depends on the
consideration or kindness of manpage writers.
However, if there were some contents which cannot be expressed
within ASCII and ASCII transliteration would cause fatal
misunderstanding of manpages, it would be a serious problem.
Please show me examples, if any. (Person's names? No. I cannot
help compromise that my name is written in ASCII Latin alphabets
when I write English manpages and even write this mail. People who
have non-ASCII letters in their name should also compromise on this
point.)
---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/