gbranden pushed a commit to branch master
in repository groff.
commit 96117f9c9f6a44b3813c942748f17aff35d7041b
Author: G. Branden Robinson <[email protected]>
AuthorDate: Thu Feb 26 23:17:01 2026 -0600
src/preproc/preconv/tests/smoke-test.sh: Refactor.
...to not spuriously fail on macOS/Darwin.
* src/preproc/preconv/tests/smoke-test.sh: Refactor and extensively
annotate assumptions about the character encoding used by the "C"
locale on the host system, which influences but does not _determine_
that which preconv uses. preconv doesn't fall all the way back to
"ANSI_X3.4-1968", a.k.a. US-ASCII, because GNU troff is designed to
assume that its input uses an encoding where every 8-bit value has a
code point assignment, which is not true of the 7-bit US-ASCII. The
upshot is that preconv assumes an encoding of ISO 8859-1 in such
cases. Practically speaking, this test now assumes that preconv falls
back to ISO 8859-1 on macOS/Darwin systems just as it does on GNU
libc-based systems.
It's a pity that the equivalent of the shell command "locale charmap"
(supported by glibc) is not standardized by POSIX.
Thanks to Alexis Hildebrant for the report in
<https://lists.gnu.org/archive/html/groff/2026-02/msg00089.html> and for
field-testing revisions to the script, and to him and John Gardner for
discussion.
---
ChangeLog | 23 ++++++++++++++++++++
src/preproc/preconv/tests/smoke-test.sh | 38 ++++++++++++++++++++++++---------
2 files changed, 51 insertions(+), 10 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index c2ad64225..4ff421944 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,26 @@
+2026-02-27 G. Branden Robinson <[email protected]>
+
+ * src/preproc/preconv/tests/smoke-test.sh: Refactor and
+ extensively annotate assumptions about the character encoding
+ used by the "C" locale on the host system, which influences but
+ does not _determine_ that which preconv uses. preconv doesn't
+ fall all the way back to "ANSI_X3.4-1968", a.k.a. US-ASCII,
+ because GNU troff is designed to assume that its input uses an
+ encoding where every 8-bit value has a code point assignment,
+ which is not true of the 7-bit US-ASCII. The upshot is that
+ preconv assumes an encoding of ISO 8859-1 in such cases.
+ Practically speaking, this test now assumes that preconv falls
+ back to ISO 8859-1 on macOS/Darwin systems just as it does on
+ GNU libc-based systems.
+
+ It's a pity that the equivalent of the shell command "locale
+ charmap" (supported by glibc) is not standardized by POSIX.
+
+ Thanks to Alexis Hildebrant for the report in
+ <https://lists.gnu.org/archive/html/groff/2026-02/msg00089.html>
+ and for field-testing revisions to the script, and to him and
+ John Gardner for discussion.
+
2026-02-22 G. Branden Robinson <[email protected]>
* src/roff/groff/tests/\
diff --git a/src/preproc/preconv/tests/smoke-test.sh
b/src/preproc/preconv/tests/smoke-test.sh
index 281b51f82..b942709ab 100755
--- a/src/preproc/preconv/tests/smoke-test.sh
+++ b/src/preproc/preconv/tests/smoke-test.sh
@@ -92,27 +92,45 @@ printf 'Eat at the caf\351.\n' \
test -z "$fail" || exit
-has_glibc=
-
-if command -v locale > /dev/null
-then
- has_glibc=yes
-fi
-
# Fall back to the locale.
#
+# It's hard to determine the character encoding of the 'C' locale
+# because the only POSIX-standard way to do so is to build a C program
+# to call `nl_langinfo(CODESET)`. There's also no POSIX-standard way
+# to ask a system to report the byte sequence it uses to encode, say,
+# "lowercase e with acute accent".
+#
+# (I think Perl can do that, though.)
+#
+# We're just a shell script, so on non-glibc systems, we guess at it.
+#
# On glibc systems, the 'C' locale uses "ANSI_X3.4-1968" for the
-# character set, but preconv assumes Latin-1 instead of US-ASCII.
+# character set, and `locale charmap` tells us as much, but preconv
+# assumes Latin-1 instead of US-ASCII, so we override that.
+#
+# On Darwin (macOS) systems, we do the same. See
+# <https://lists.gnu.org/archive/html/groff/2026-02/msg00129.html>.
#
-# On non-glibc systems, who knows? But at least some use UTF-8.
+# For everything else, we assume UTF-8.
-if [ -n "$has_glibc" ]
+libc_vendor=
+
+if command -v locale > /dev/null
then
+ libc_vendor=gnu
+ charset=ISO-8859-1
+elif [ "$(uname -s)" = "Darwin" ]
+then
+ libc_vendor=apple
charset=ISO-8859-1
else
+ libc_vendor=unknown
charset=UTF-8
fi
+printf "standard C library vendor: %s;" $libc_vendor >&2
+printf " expecting preconv character encoding %s\n" $charset >&2
+
echo "testing fallback to locale setting in environment" >&2
printf 'Eat at the caf\351.\n' \
| "$preconv" -d 2>&1 > /dev/null \
_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit