On 16/03/15 02:30, Bruno Haible wrote:
> POSIX [1] specifies that the recognition of characters in 'tr' depends on
> the environment variables LANG, etc.
>
> But trying to replace a multibyte character by another character does not
> work:
>
> $ echo $LANG
> de_DE.UTF-8
> $ enspace=`printf '\u2002'`
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 20 20 59
> 0000005
>
> Expected output would be:
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 59
> 0000003
>
> With 'sed' it works:
>
> $ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
> 0000000 58 20 59
> 0000003
>
> Bruno
>
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html
Yes you're right Bruno.
Multi-byte support in coreutils in general has languished,
but we hope to start improving support in the next major release (9?)
after the current imminent 8.24 stable release.
To that end I've put together a plan:
http://www.pixelbeat.org/docs/coreutils_i18n/
cheers,
Pádraig.