Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution

Stephane Chazelas Mon, 18 Nov 2019 23:58:03 -0800

2019-11-18 20:46:26 +0000, Stephane Chazelas:
[...]
> > printf -v B '\u204B'
> > set -- ${B//?()/ }
> > echo "${@@Q}"       #-> $'\342' $'\201' $'\213'
[...]
> It seems to me that zsh's approach is best:
> 
> $ A=$'\u2048\201\u2048' zsh  -c "printf '%q\n' \"\${A//$'\201'/:}\""
> ⁈:⁈
> 
> That is replace that \201 byte, except when it's part of a
> properly encoded character.
[...]


Actually, zsh would also break a character if the byte to be
replaced is the first of the character:

$ A=$'\u2048\342\u2048' zsh -c "printf '%q\n' \"\${A//$'\342'/:}\""
:$'\201'$'\210'::$'\201'$'\210'

Note that in charsets like BIG5/GB18030... which have characters
whose encoding contains the encoding of other characters, bash
seems to behave better than in UTF-8.

For instance the encoding of é in BIG5-HKSCS is 0x88 0x6d where
0x6d is also the encoding of "m" like in ASCII.

$ printf é | iconv -t big5-hkscs | od -tc -tx1
0000000 210   m
         88  6d
0000002
$ LC_ALL=zh_HK.big5hkscs luit
$ U=Stéphane bash -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane ksh93 -c 'printf "%s\n" "${U//m}"'
Stéphane
$ U=Stéphane zsh -c 'printf "%s\n" "${U//m}"'
Stéphane

All 3 shells OK, but:

$ U=Stéphane bash -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane ksh  -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane
$ U=Stéphane zsh  -c 'printf "%s\n" "${U//$'\''\210'\''}"'
Stmphane

All 3 shells "break" that é character there.

-- 
Stephane

Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution

Reply via email to