On 29/08/2025 05:23, Collin Funk wrote:
Pádraig Brady <p...@draigbrady.com> writes:

Perhaps the techniques from tests/wc/wc-nbsp.sh could be used?
Maybe something like:

check_space() {
   char="$1"
   # Use -L to determine whether NBSP is printable.
   # FreeBSD 11 and OS X treat NBSP as non printable ?
   test "$(env printf "=$char=" | wc -L)" = 3 &&
     test $(env printf "=$char=" | wc -w) = 2
}

if check_space '\u2007'; then
   ...
fi

Thanks for the suggestion, but that doesn't work. Any issue with
skipping based on $host_os for this test and for fold-spaces.sh?

I was thinking of testing "printf '\u00A0' | ./src/tr -d '[:blank:]'"
but that won't work since 'tr' operates on bytes and U+00A0 is
represented as 0xc2 0xa0 in UTF-8.

Oh right sorry. wc has it's own iswnbspace,
whereas fold essentially relies on the system iswblank.

That means you could correlate with uniq though. Something like:

  isblank() { test $(printf "a$1a\nb$1b\n" | uniq -f1 | wc -l) = 2; }
  if ! isblank '\u2007'; then
    # can test '\u2007' is treated as non breaking space
  fi

That would be a preferable way to gate the test.

Though I'm thinking now we should adjust fold(1) a little
to ensure we don't break with nbsp consistently across systems.
I.e. move/rename iswnbspace() from wc.c to src/system.h
and use it in fold (and wc) to give consistent behavior.
I.e. fold would use: c32isblank() && ! c32isnbspace(),
and the test would stay as is.

cheers,
Padraig



Reply via email to