2019-11-17 01:25:31 -0800, Chris Carlen: [...] > # write 'REVERSE PILCROW SIGN' to B, then repeat as above: > printf -v B '\u204B' > set -- ${B//?()/ } > echo "${@@Q}" #-> $'\342' $'\201' $'\213' > > # NOTE: Since there is only one character (under the UTF-8 locale), > # this should have set only the first positional parameter with the > # character REVERSE PILCROW SIGN, not split it into bytes (AFAIK). [...]
Yes, the question is where to resume searching after a match of an empty string in ${var//pattern/replacement}. Note that it's even worse in ksh93 where bash copied that syntax from: $ A=$'\u2048\u2048' ksh93 -c 'printf "%q\n" "${A//?()/:}"' $':\u[2048]:\x81:\x88:\u[2048]:\x81:\x88:' (here with ksh93u+) Then there's the question of what ${B/$'\201'/} should do. Should that $'\201' match the byte component of the encoding of U+204B? It seems to me that zsh's approach is best: $ A=$'\u2048\201\u2048' zsh -c "printf '%q\n' \"\${A//$'\201'/:}\"" ⁈:⁈ That is replace that \201 byte, except when it's part of a properly encoded character. Compare with: $ A=$'\u2048\201\u2048' bash -c "printf '%q\n' \"\${A//$'\201'/:}\"" $'\342:\210:\342:\210' $ A=$'\u2048\201\u2048' ksh93 -c "printf '%q\n' \"\${A//$'\201'/:}\"" $'\u[2048]:\x88:\u[2048]:\x88' (or yash which can't deal with that \201 byte at all as it can't form a valid character). -- Stephane