gbranden pushed a commit to branch master
in repository groff.

commit 96117f9c9f6a44b3813c942748f17aff35d7041b
Author: G. Branden Robinson <[email protected]>
AuthorDate: Thu Feb 26 23:17:01 2026 -0600

    src/preproc/preconv/tests/smoke-test.sh: Refactor.
    
    ...to not spuriously fail on macOS/Darwin.
    
    * src/preproc/preconv/tests/smoke-test.sh: Refactor and extensively
      annotate assumptions about the character encoding used by the "C"
      locale on the host system, which influences but does not _determine_
      that which preconv uses.  preconv doesn't fall all the way back to
      "ANSI_X3.4-1968", a.k.a. US-ASCII, because GNU troff is designed to
      assume that its input uses an encoding where every 8-bit value has a
      code point assignment, which is not true of the 7-bit US-ASCII.  The
      upshot is that preconv assumes an encoding of ISO 8859-1 in such
      cases.  Practically speaking, this test now assumes that preconv falls
      back to ISO 8859-1 on macOS/Darwin systems just as it does on GNU
      libc-based systems.
    
    It's a pity that the equivalent of the shell command "locale charmap"
    (supported by glibc) is not standardized by POSIX.
    
    Thanks to Alexis Hildebrant for the report in
    <https://lists.gnu.org/archive/html/groff/2026-02/msg00089.html> and for
    field-testing revisions to the script, and to him and John Gardner for
    discussion.
---
 ChangeLog                               | 23 ++++++++++++++++++++
 src/preproc/preconv/tests/smoke-test.sh | 38 ++++++++++++++++++++++++---------
 2 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c2ad64225..4ff421944 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,26 @@
+2026-02-27  G. Branden Robinson <[email protected]>
+
+       * src/preproc/preconv/tests/smoke-test.sh: Refactor and
+       extensively annotate assumptions about the character encoding
+       used by the "C" locale on the host system, which influences but
+       does not _determine_ that which preconv uses.  preconv doesn't
+       fall all the way back to "ANSI_X3.4-1968", a.k.a. US-ASCII,
+       because GNU troff is designed to assume that its input uses an
+       encoding where every 8-bit value has a code point assignment,
+       which is not true of the 7-bit US-ASCII.  The upshot is that
+       preconv assumes an encoding of ISO 8859-1 in such cases.
+       Practically speaking, this test now assumes that preconv falls
+       back to ISO 8859-1 on macOS/Darwin systems just as it does on
+       GNU libc-based systems.
+
+       It's a pity that the equivalent of the shell command "locale
+       charmap" (supported by glibc) is not standardized by POSIX.
+
+       Thanks to Alexis Hildebrant for the report in
+       <https://lists.gnu.org/archive/html/groff/2026-02/msg00089.html>
+       and for field-testing revisions to the script, and to him and
+       John Gardner for discussion.
+
 2026-02-22  G. Branden Robinson <[email protected]>
 
        * src/roff/groff/tests/\
diff --git a/src/preproc/preconv/tests/smoke-test.sh 
b/src/preproc/preconv/tests/smoke-test.sh
index 281b51f82..b942709ab 100755
--- a/src/preproc/preconv/tests/smoke-test.sh
+++ b/src/preproc/preconv/tests/smoke-test.sh
@@ -92,27 +92,45 @@ printf 'Eat at the caf\351.\n' \
 
 test -z "$fail" || exit
 
-has_glibc=
-
-if command -v locale > /dev/null
-then
-    has_glibc=yes
-fi
-
 # Fall back to the locale.
 #
+# It's hard to determine the character encoding of the 'C' locale
+# because the only POSIX-standard way to do so is to build a C program
+# to call `nl_langinfo(CODESET)`.  There's also no POSIX-standard way
+# to ask a system to report the byte sequence it uses to encode, say,
+# "lowercase e with acute accent".
+#
+# (I think Perl can do that, though.)
+#
+# We're just a shell script, so on non-glibc systems, we guess at it.
+#
 # On glibc systems, the 'C' locale uses "ANSI_X3.4-1968" for the
-# character set, but preconv assumes Latin-1 instead of US-ASCII.
+# character set, and `locale charmap` tells us as much, but preconv
+# assumes Latin-1 instead of US-ASCII, so we override that.
+#
+# On Darwin (macOS) systems, we do the same.  See
+# <https://lists.gnu.org/archive/html/groff/2026-02/msg00129.html>.
 #
-# On non-glibc systems, who knows?  But at least some use UTF-8.
+# For everything else, we assume UTF-8.
 
-if [ -n "$has_glibc" ]
+libc_vendor=
+
+if command -v locale > /dev/null
 then
+    libc_vendor=gnu
+    charset=ISO-8859-1
+elif [ "$(uname -s)" = "Darwin" ]
+then
+    libc_vendor=apple
     charset=ISO-8859-1
 else
+    libc_vendor=unknown
     charset=UTF-8
 fi
 
+printf "standard C library vendor: %s;" $libc_vendor >&2
+printf " expecting preconv character encoding %s\n" $charset >&2
+
 echo "testing fallback to locale setting in environment" >&2
 printf 'Eat at the caf\351.\n' \
     | "$preconv" -d 2>&1 > /dev/null \

_______________________________________________
groff-commit mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/groff-commit

Reply via email to