Package: groff
Version: 1.22.4-8
Severity: normal

When using -k on a file which contains a single UTF-8 character, preconv
misdetects the text as some other encoding, even though the locale in
use is UTF-8.  Since UTF-8 is nearly universally used for text files on
Unix, this leads to bizarre behaviour and misencodings.

For example, given the first file below, groff prints a warning and then
proceeds to insert an incorrect character.  However, when a second UTF-8
character is included, the file works.

My recommendation here is that when detecting character sets, if the
data is valid UTF-8, then UTF-8 be used as the encoding.  The uchardet
detection of "MAC-CENTRALEUROPE" may be acceptable for some web pages,
where encoding can be specified explicitly at the HTTP level, but it is
not a prudent choice for documents on Debian (which has never supported
this as a valid system encoding) in 2022.  I very much doubt this would
be a prudent encoding on macOS in 2022, either, which, as I understand
it, has used UTF-8 exclusively since 10.0, released over two decades
ago.

Command line:

  LC_ALL=fr_CA.UTF-8 groff -Tps -dpaper=com10l -P-pcom10 -P-l -k envelope.me 
>envelope.ps

broken
----
.nf
.po 0.5c
.sp 0.5c
.ft P
Toronto City Hall
100 Queen Street W
Toronto ON M5H 2N2
Canada
.sp 2c
.in 8.5c
New York City Hall
1 City Hall
New York NY 10007-1298
États-Unis
----

working
----
.nf
.po 0.5c
.sp 0.5c
.ft P
Hôtel de Ville de Toronto
100 Rue Queen O
Toronto ON M5H 2N2
Canada
.sp 2c
.in 8.5c
New York City Hall
1 City Hall
New York NY 10007-1298
États-Unis
----


-- System Information:
Debian Release: bookworm/sid
  APT prefers stable-security
  APT policy: (500, 'stable-security'), (500, 'unstable'), (500, 'stable'), 
(500, 'oldstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.15.0-3-amd64 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_WARN
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages groff depends on:
ii  groff-base  1.22.4-8
ii  libc6       2.33-7
ii  libgcc-s1   12-20220319-1
ii  libstdc++6  12-20220319-1
ii  libx11-6    2:1.7.5-1
ii  libxaw7     2:1.0.14-1
ii  libxmu6     2:1.1.3-3
ii  libxt6      1:1.2.1-1

Versions of packages groff recommends:
ii  ghostscript                      9.56.0~dfsg-1
ii  imagemagick                      8:6.9.11.60+dfsg-1.3+b2
ii  imagemagick-6.q16 [imagemagick]  8:6.9.11.60+dfsg-1.3+b2
ii  libpaper1                        1.1.28+b1
ii  netpbm                           2:10.97.00-2
ii  perl                             5.34.0-3
ii  psutils                          1.17.dfsg-4

groff suggests no packages.

-- no debconf information

-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature

Reply via email to