bug#12192: tr - bytes vs characters

Jim Meyering Sat, 15 Sep 2012 03:29:07 -0700

forcemerge 12192 9365
thanks

Michael Stummvoll wrote:
> Hi gnu folks,
>
> as already known, tr cannot handle multibyte-encodings like utf-8:
>
>> mst@eddie:~$ echo "foo" | tr o ö
>> fÃÃ
>
> i know, that multibyte encoding support is not needed for
> posix-compilance, BUT:
>
> the manpage of tr says the following:
>
>> Translate, squeeze, and/or delete characters from standard input,
>> writing to standard output.
>
> and thats the inconsistence imho.
>
> The typical interpretation of "character" in such a context means one
> character on display. regardless which encoding is used or how many
> bytes are used to display this. So, if tr realy translates "characters"
> it should preserve the encoding. If it doesn't do, it does not
> translate "characters" but "bytes". So there I see two ways:
>
> - add multybyte-encoding support to tr
> or
> - change the manpage and helptext to not say "characters" but "bytes"
>
> since it doesn't seem that somebody want to add the support to tr, an
> update of the manpage would be the easier way to ensure the consistence.


Thanks for the report.
I'm merging this issue with the others that relate to tr
and multi-byte support.

bug#12192: tr - bytes vs characters

Reply via email to