Kaixo!
On Tue, Apr 10, 2001 at 02:40:51PM +0200, Bruno Haible wrote:
>> The problem of man pages is that they don't told their encoding; that will
>> be a big problem when people will start to switch to utf-8.
[the proposed algo]
> Sounds complicated indeed. Why would you convert the man page itself
> to UTF-8, when groff already has an option (-Tutf8) to produce UTF-8
> output?
That's not the problem.
The problem is it is *impossible* to know for sure what encoding
is used in the source troff file; there is no charset=....
line.
So, it is needed to do some euristics to try to know.
Hopefully the choices will be between traditional encoding
(eg: koi8-r for 'ru', iso-8859-2 for 'pl' etc) and utf-8.
(detecting between koi8-r and cp1251 would be hopeless imho),
and utf-8 has a nice particularity of being quite easily
detectable by the special patterns of the 8bit bytes.
(CJK man pages can also have a supplementary difficulty, as
they may exist in 8bit encoding (eg: euc_jp, euc-kr, gb2312 8bit)
and 7bit iso-2022-* encoding; but that last can be somehow recognized by its
special escape sequences).
> I'd suggest:
> - Assume the manpages are in traditional format.
But what when utf-8 encoded pages will start to appear ?
(and when people will start to use utf-8 as their default
encoding you will see such case, as translators will issue
their translated man pages in the encoding they are using as default)
> The groff
> developers will have to define how UTF-8 manpages shall define
> their encoding.
Indeed that would be a very nice thing; and should even be extended to all
encodings, like html pages or *.po files told their encoding.
It is the only way to properly solve the problem.
Can we reasonably expect that such a solution can be defined
and put in wide use in a near future? (I have no idea, so I aks)
If yes, that would indeed be a much better solution.
> Now all that will remain to be done is to fix 'more' and 'less' to
> correctly the resulting UTF-8 encoded output.
??? What is the problem (I can use less and more without any
problem to see utf-8 files).
>
> Bruno
--
Ki ça vos våye bén,
Pablo Saratxaga
http://www.srtxg.easynet.be/ PGP Key available, key ID: 0x8F0E4975
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/