Pádraig Brady <p...@draigbrady.com> writes: >> Thanks, I forgot about that function. That sounds like a good idea >> to >> me. We can be nice to people who do not use glibc. >> We will have to hoist the 'posixly_correct' check out of it before >> though. Technically POSIX says that 'fold -s' should only break at >> <blank> characters. But I rather avoid adding more >> getenv ("POSIXLY_CORRECT") to programs that do not yet have them. > > Yes I agree that fold should not depend on POSIXLY_CORRECT, > so c32isnbspace() should only look at the passed char.
This patch should do the trick. It fixes it on Solaris 11.4 (cfarm215). I couldn't reproduce the failure seen on the CI machines in my NetBSD 10 VM. But I see no reason why this fix wouldn't work there too. Will push it tomorrow. Collin
>From b161b6f6759f2587b82972aee563f197e24f5351 Mon Sep 17 00:00:00 2001 Message-ID: <b161b6f6759f2587b82972aee563f197e24f5351.1756872209.git.collin.fu...@gmail.com> From: Collin Funk <collin.fu...@gmail.com> Date: Tue, 2 Sep 2025 20:08:20 -0700 Subject: [PATCH] fold: check that characters are not non-breaking spaces when -s is used NetBSD 10 and Solaris 11.4 treat non-breaking spaces as blank characters unlike glibc. * src/system.h: Include uchar.h. (c32isnbspace): New function based on iswnbspace from src/wc.c. * src/wc.c (iswnbspace): Remove function. (wc, main): Use the new c32isnbspace function. * src/fold.c (fold_file): Likewise. Fixes https://bugs.gnu.org/79300 --- src/fold.c | 2 +- src/system.h | 9 +++++++++ src/wc.c | 18 +++++------------- 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/src/fold.c b/src/fold.c index 5f71d5c55..b90bc7d80 100644 --- a/src/fold.c +++ b/src/fold.c @@ -216,7 +216,7 @@ fold_file (char const *filename, size_t width) for (mcel_t g2; logical_p < logical_lim; logical_p += g2.len) { g2 = mcel_scan (logical_p, logical_lim); - if (c32isblank (g2.ch)) + if (c32isblank (g2.ch) && ! c32isnbspace (g2.ch)) { space_length = g2.len; logical_end = logical_p - line_out; diff --git a/src/system.h b/src/system.h index 5cb751cc8..2296c8bbb 100644 --- a/src/system.h +++ b/src/system.h @@ -70,6 +70,7 @@ #include <stdckdint.h> #include <stddef.h> #include <string.h> +#include <uchar.h> #include <errno.h> /* Some systems don't define this; POSIX mentions it but says it is @@ -148,6 +149,14 @@ enum errors that the cast doesn't. */ static inline unsigned char to_uchar (char ch) { return ch; } +/* Return non zero if a non breaking space. */ +ATTRIBUTE_PURE +static inline int +c32isnbspace (char32_t wc) +{ + return wc == 0x00A0 || wc == 0x2007 || wc == 0x202F || wc == 0x2060; +} + #include <locale.h> /* Take care of NLS matters. */ diff --git a/src/wc.c b/src/wc.c index 05e78676e..d0723d812 100644 --- a/src/wc.c +++ b/src/wc.c @@ -191,16 +191,6 @@ the following order: newline, word, character, byte, maximum line length.\n\ exit (status); } -/* Return non zero if a non breaking space. */ -ATTRIBUTE_PURE -static int -iswnbspace (wint_t wc) -{ - return ! posixly_correct - && (wc == 0x00A0 || wc == 0x2007 - || wc == 0x202F || wc == 0x2060); -} - /* FILE is the name of the file (or null for standard input) associated with the specified counters. */ static void @@ -525,8 +515,9 @@ wc (int fd, char const *file_x, struct fstatus *fstatus) if (width > 0) linepos += width; } - in_word2 = ! iswspace (wide_char) - && ! iswnbspace (wide_char); + in_word2 = (! iswspace (wide_char) + && !(! posixly_correct + && c32isnbspace (wide_char))); } /* Count words by counting word starts, i.e., each @@ -798,7 +789,8 @@ main (int argc, char **argv) wc_isprint[i] = !!isprint (i); if (print_words) for (int i = 0; i <= UCHAR_MAX; i++) - wc_isspace[i] = isspace (i) || iswnbspace (btoc32 (i)); + wc_isspace[i] = isspace (i) || (! posixly_correct + && c32isnbspace (btoc32 (i))); bool read_tokens = false; struct argv_iterator *ai; -- 2.51.0