Pádraig Brady <p...@draigbrady.com> writes:

>> Thanks, I forgot about that function. That sounds like a good idea
>> to
>> me. We can be nice to people who do not use glibc.
>> We will have to hoist the 'posixly_correct' check out of it before
>> though. Technically POSIX says that 'fold -s' should only break at
>> <blank> characters. But I rather avoid adding more
>> getenv ("POSIXLY_CORRECT") to programs that do not yet have them.
>
> Yes I agree that fold should not depend on POSIXLY_CORRECT,
> so c32isnbspace() should only look at the passed char.

This patch should do the trick. It fixes it on Solaris 11.4 (cfarm215).
I couldn't reproduce the failure seen on the CI machines in my NetBSD 10
VM. But I see no reason why this fix wouldn't work there too.

Will push it tomorrow.

Collin

>From b161b6f6759f2587b82972aee563f197e24f5351 Mon Sep 17 00:00:00 2001
Message-ID: <b161b6f6759f2587b82972aee563f197e24f5351.1756872209.git.collin.fu...@gmail.com>
From: Collin Funk <collin.fu...@gmail.com>
Date: Tue, 2 Sep 2025 20:08:20 -0700
Subject: [PATCH] fold: check that characters are not non-breaking spaces when
 -s is used

NetBSD 10 and Solaris 11.4 treat non-breaking spaces as blank
characters unlike glibc.

* src/system.h: Include uchar.h.
(c32isnbspace): New function based on iswnbspace from src/wc.c.
* src/wc.c (iswnbspace): Remove function.
(wc, main): Use the new c32isnbspace function.
* src/fold.c (fold_file): Likewise.
Fixes https://bugs.gnu.org/79300
---
 src/fold.c   |  2 +-
 src/system.h |  9 +++++++++
 src/wc.c     | 18 +++++-------------
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/src/fold.c b/src/fold.c
index 5f71d5c55..b90bc7d80 100644
--- a/src/fold.c
+++ b/src/fold.c
@@ -216,7 +216,7 @@ fold_file (char const *filename, size_t width)
                   for (mcel_t g2; logical_p < logical_lim; logical_p += g2.len)
                     {
                       g2 = mcel_scan (logical_p, logical_lim);
-                      if (c32isblank (g2.ch))
+                      if (c32isblank (g2.ch) && ! c32isnbspace (g2.ch))
                         {
                           space_length = g2.len;
                           logical_end = logical_p - line_out;
diff --git a/src/system.h b/src/system.h
index 5cb751cc8..2296c8bbb 100644
--- a/src/system.h
+++ b/src/system.h
@@ -70,6 +70,7 @@
 #include <stdckdint.h>
 #include <stddef.h>
 #include <string.h>
+#include <uchar.h>
 #include <errno.h>
 
 /* Some systems don't define this; POSIX mentions it but says it is
@@ -148,6 +149,14 @@ enum
    errors that the cast doesn't.  */
 static inline unsigned char to_uchar (char ch) { return ch; }
 
+/* Return non zero if a non breaking space.  */
+ATTRIBUTE_PURE
+static inline int
+c32isnbspace (char32_t wc)
+{
+  return wc == 0x00A0 || wc == 0x2007 || wc == 0x202F || wc == 0x2060;
+}
+
 #include <locale.h>
 
 /* Take care of NLS matters.  */
diff --git a/src/wc.c b/src/wc.c
index 05e78676e..d0723d812 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -191,16 +191,6 @@ the following order: newline, word, character, byte, maximum line length.\n\
   exit (status);
 }
 
-/* Return non zero if a non breaking space.  */
-ATTRIBUTE_PURE
-static int
-iswnbspace (wint_t wc)
-{
-  return ! posixly_correct
-         && (wc == 0x00A0 || wc == 0x2007
-             || wc == 0x202F || wc == 0x2060);
-}
-
 /* FILE is the name of the file (or null for standard input)
    associated with the specified counters.  */
 static void
@@ -525,8 +515,9 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
                           if (width > 0)
                             linepos += width;
                         }
-                      in_word2 = ! iswspace (wide_char)
-                                 && ! iswnbspace (wide_char);
+                      in_word2 = (! iswspace (wide_char)
+                                  && !(! posixly_correct
+                                       && c32isnbspace (wide_char)));
                     }
 
                   /* Count words by counting word starts, i.e., each
@@ -798,7 +789,8 @@ main (int argc, char **argv)
       wc_isprint[i] = !!isprint (i);
   if (print_words)
     for (int i = 0; i <= UCHAR_MAX; i++)
-      wc_isspace[i] = isspace (i) || iswnbspace (btoc32 (i));
+      wc_isspace[i] = isspace (i) || (! posixly_correct
+                                      && c32isnbspace (btoc32 (i)));
 
   bool read_tokens = false;
   struct argv_iterator *ai;
-- 
2.51.0

Reply via email to