On Sat, Sep 13, 2025, at 11:02 AM, Greg Wooledge wrote:
> I really think this is a bad idea.  A script needs to have predictable
> behavior regardless of what bizarre locales may exist on the target
> system.

Turns out that this doesn't even require a particularly "bizarre"
locale to observe.  ISO/IEC 8859-1 encodes NBSP as A0, so on macOS:

        $ export LC_ALL=en_US.ISO8859-1
        $ [[ $'\xA0' = [[:blank:]] ]]; echo "$?"
        0
        $ eval set a$'\xA0'z; echo "$# args"
        2 args

Behaviors vary somewhat among shells.  Some don't recognize A0 as
a <blank>, zsh recognizes it as a <blank> but doesn't delimit tokens
with it, and yash agrees with bash.  (Various compatibility modes
don't make a difference here.)

        $ cat /tmp/nbsp_test.sh
        nbsp=$(printf '\240')

        case $nbsp in
        [[:blank:]])
                printf 'blank, '
                ;;
        *)
                printf 'not blank, '
                ;;
        esac

        eval set "a${nbsp}z"

        case $# in
        1)
                echo not delimiting
                ;;
        2)
                echo delimiting
                ;;
        esac
        $ export LC_ALL=en_US.ISO8859-1
        $ /bin/bash /tmp/nbsp_test.sh   # bash 3.2.57
        blank, delimiting
        $ ~/build/bash/bash "$_"        # bash devel
        blank, delimiting
        $ dash "$_"                     # dash 0.5.12
        not blank, not delimiting
        $ /bin/ksh "$_"                 # ksh93u+ 2012-08-01
        not blank, not delimiting
        $ ksh "$_"                      # ksh93u+m/1.0.10 2024-08-01
        not blank, not delimiting
        $ mksh "$_"                     # mksh R59 2020/10/31
        not blank, not delimiting
        $ oksh "$_"                     # OpenBSD 7.7 ksh
        not blank, not delimiting
        $ yash "$_"                     # yash 2.58.1
        blank, delimiting
        $ zsh "$_"                      # zsh 5.9
        blank, not delimiting

POSIX seems to require delimiting on all <blank>s [*], without
qualification.

        7.  If the current character is an unquoted <blank>, any
            token containing the previous character is delimited
            and the current character shall be discarded.

yash takes this very seriously.

        $ export LC_ALL=en_US.UTF-8
        $ [[ $'\uA0' = [[:blank:]] ]]; echo "$?"
        0
        $ bash -c 'set a'$'\uA0''z; echo "$# args"'
        1 args
        $ yash -c 'set a'$'\uA0''z; echo "$# args"'
        2 args

  [*] 
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/utilities/V3_chap02.html#tag_19_03

-- 
vq

Reply via email to