2019-11-18 20:46:26 +0000, Stephane Chazelas: [...] > > printf -v B '\u204B' > > set -- ${B//?()/ } > > echo "${@@Q}" #-> $'\342' $'\201' $'\213' [...] > It seems to me that zsh's approach is best: > > $ A=$'\u2048\201\u2048' zsh -c "printf '%q\n' \"\${A//$'\201'/:}\"" > ⁈:⁈ > > That is replace that \201 byte, except when it's part of a > properly encoded character. [...]
Actually, zsh would also break a character if the byte to be replaced is the first of the character: $ A=$'\u2048\342\u2048' zsh -c "printf '%q\n' \"\${A//$'\342'/:}\"" :$'\201'$'\210'::$'\201'$'\210' Note that in charsets like BIG5/GB18030... which have characters whose encoding contains the encoding of other characters, bash seems to behave better than in UTF-8. For instance the encoding of é in BIG5-HKSCS is 0x88 0x6d where 0x6d is also the encoding of "m" like in ASCII. $ printf é | iconv -t big5-hkscs | od -tc -tx1 0000000 210 m 88 6d 0000002 $ LC_ALL=zh_HK.big5hkscs luit $ U=Stéphane bash -c 'printf "%s\n" "${U//m}"' Stéphane $ U=Stéphane ksh93 -c 'printf "%s\n" "${U//m}"' Stéphane $ U=Stéphane zsh -c 'printf "%s\n" "${U//m}"' Stéphane All 3 shells OK, but: $ U=Stéphane bash -c 'printf "%s\n" "${U//$'\''\210'\''}"' Stmphane $ U=Stéphane ksh -c 'printf "%s\n" "${U//$'\''\210'\''}"' Stmphane $ U=Stéphane zsh -c 'printf "%s\n" "${U//$'\''\210'\''}"' Stmphane All 3 shells "break" that é character there. -- Stephane