Pádraig Brady <p...@draigbrady.com> writes:

>> This patch should do the trick. It fixes it on Solaris 11.4
>> (cfarm215).
>> I couldn't reproduce the failure seen on the CI machines in my NetBSD 10
>> VM. But I see no reason why this fix wouldn't work there too.
>> Will push it tomorrow.
>
> I would have left iswnbspace() in wc.c, calling into c32isnbspace(),
> otherwise the double negative with posixly_correct is awkward.
> Anyway the logic looks good.

I was about 50/50 whether the double negation was too ugly to use. :)

I'll leave the function there, but name it maybe_c32isnbspace(). Since I
don't want the function to be misunderstood as a wchar_t function.

Pushed the attatched two patches. The second fixes a 'make syntax-check'
failure. Will close this bug now.

Collin

P.S. I actually just noticed this unchanged hunk in my diff:

$ git ls-files | grep -E '\.[ch]' | xargs grep -F 'isw'
src/wc.c:                      in_word2 = (! iswspace (wide_char)

Okay to change this one to the c32 variant?

Collin

>From 3a81d44d43b078ee20f1ce2b907c23d0926070b3 Mon Sep 17 00:00:00 2001
Message-ID: <3a81d44d43b078ee20f1ce2b907c23d0926070b3.1756952725.git.collin.fu...@gmail.com>
From: Collin Funk <collin.fu...@gmail.com>
Date: Tue, 2 Sep 2025 20:08:20 -0700
Subject: [PATCH 1/2] fold: check that characters are not non-breaking spaces
 when -s is used

NetBSD 10 and Solaris 11.4 treat non-breaking spaces as blank
characters unlike glibc.

* src/system.h: Include uchar.h.
(c32isnbspace): New function based on iswnbspace from src/wc.c.
* src/fold.c (fold_file): Use it.
* src/wc.c (iswnbspace): Remove function.
(maybe_c32isnbspace): New function.
(wc, main): Use it.
Fixes https://bugs.gnu.org/79300
---
 src/fold.c   |  2 +-
 src/system.h |  9 +++++++++
 src/wc.c     | 15 +++++++--------
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/src/fold.c b/src/fold.c
index 5f71d5c55..b90bc7d80 100644
--- a/src/fold.c
+++ b/src/fold.c
@@ -216,7 +216,7 @@ fold_file (char const *filename, size_t width)
                   for (mcel_t g2; logical_p < logical_lim; logical_p += g2.len)
                     {
                       g2 = mcel_scan (logical_p, logical_lim);
-                      if (c32isblank (g2.ch))
+                      if (c32isblank (g2.ch) && ! c32isnbspace (g2.ch))
                         {
                           space_length = g2.len;
                           logical_end = logical_p - line_out;
diff --git a/src/system.h b/src/system.h
index 5cb751cc8..2296c8bbb 100644
--- a/src/system.h
+++ b/src/system.h
@@ -70,6 +70,7 @@
 #include <stdckdint.h>
 #include <stddef.h>
 #include <string.h>
+#include <uchar.h>
 #include <errno.h>
 
 /* Some systems don't define this; POSIX mentions it but says it is
@@ -148,6 +149,14 @@ enum
    errors that the cast doesn't.  */
 static inline unsigned char to_uchar (char ch) { return ch; }
 
+/* Return non zero if a non breaking space.  */
+ATTRIBUTE_PURE
+static inline int
+c32isnbspace (char32_t wc)
+{
+  return wc == 0x00A0 || wc == 0x2007 || wc == 0x202F || wc == 0x2060;
+}
+
 #include <locale.h>
 
 /* Take care of NLS matters.  */
diff --git a/src/wc.c b/src/wc.c
index 05e78676e..f22f658b4 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -191,14 +191,13 @@ the following order: newline, word, character, byte, maximum line length.\n\
   exit (status);
 }
 
-/* Return non zero if a non breaking space.  */
+/* Return non zero if POSIXLY_CORRECT is not set and WC is a non breaking
+   space.  */
 ATTRIBUTE_PURE
 static int
-iswnbspace (wint_t wc)
+maybe_c32isnbspace (char32_t wc)
 {
-  return ! posixly_correct
-         && (wc == 0x00A0 || wc == 0x2007
-             || wc == 0x202F || wc == 0x2060);
+  return ! posixly_correct && c32isnbspace (wc);
 }
 
 /* FILE is the name of the file (or null for standard input)
@@ -525,8 +524,8 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
                           if (width > 0)
                             linepos += width;
                         }
-                      in_word2 = ! iswspace (wide_char)
-                                 && ! iswnbspace (wide_char);
+                      in_word2 = (! iswspace (wide_char)
+                                  && ! maybe_c32isnbspace (wide_char));
                     }
 
                   /* Count words by counting word starts, i.e., each
@@ -798,7 +797,7 @@ main (int argc, char **argv)
       wc_isprint[i] = !!isprint (i);
   if (print_words)
     for (int i = 0; i <= UCHAR_MAX; i++)
-      wc_isspace[i] = isspace (i) || iswnbspace (btoc32 (i));
+      wc_isspace[i] = isspace (i) || maybe_c32isnbspace (btoc32 (i));
 
   bool read_tokens = false;
   struct argv_iterator *ai;
-- 
2.51.0

>From 022673367b7e3652410bce912a12a43c2e5f4607 Mon Sep 17 00:00:00 2001
Message-ID: <022673367b7e3652410bce912a12a43c2e5f4607.1756952725.git.collin.fu...@gmail.com>
In-Reply-To: <3a81d44d43b078ee20f1ce2b907c23d0926070b3.1756952725.git.collin.fu...@gmail.com>
References: <3a81d44d43b078ee20f1ce2b907c23d0926070b3.1756952725.git.collin.fu...@gmail.com>
From: Collin Funk <collin.fu...@gmail.com>
Date: Wed, 3 Sep 2025 19:15:49 -0700
Subject: [PATCH 2/2] maint: avoid syntax-check failure from previous commit

* src/df.c: Don't include uchar.h.
* src/ls.c: Likewise.
* src/wc.c: Likewise.
---
 src/df.c | 1 -
 src/ls.c | 1 -
 src/wc.c | 1 -
 3 files changed, 3 deletions(-)

diff --git a/src/df.c b/src/df.c
index db5287157..77576513e 100644
--- a/src/df.c
+++ b/src/df.c
@@ -23,7 +23,6 @@
 #include <sys/types.h>
 #include <getopt.h>
 #include <c-ctype.h>
-#include <uchar.h>
 
 #include "system.h"
 #include "assure.h"
diff --git a/src/ls.c b/src/ls.c
index d4aae25ca..498ae3d73 100644
--- a/src/ls.c
+++ b/src/ls.c
@@ -55,7 +55,6 @@
 #include <pwd.h>
 #include <getopt.h>
 #include <signal.h>
-#include <uchar.h>
 
 #if HAVE_LANGINFO_CODESET
 # include <langinfo.h>
diff --git a/src/wc.c b/src/wc.c
index f22f658b4..214637dcf 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -23,7 +23,6 @@
 #include <stdio.h>
 #include <getopt.h>
 #include <sys/types.h>
-#include <uchar.h>
 
 #include <argmatch.h>
 #include <argv-iter.h>
-- 
2.51.0

Reply via email to