Don't know which is the official way to report a bug in 'tr' so I will copy to this list too. CC me on replies as I am not subscribing.
> ----- Original Message ----- > From: Marton Kadar > Sent: 02/24/12 03:18 PM > To: [email protected] > Subject: Example > > Environment for Hungary where á and í are proper lowercase letters > but for example Spanish has these letters too: > > $ set | grep ^L > LANG=hu_HU.UTF-8 > LC_ALL=hu_HU.UTF-8 > LINES=73 > LOGNAME=kadar1marto518 > > Now let's see the bytestream for the following string > (which means flood in Hungarian): > > $ echo árvíz | od -c > 0000000 303 241 r v 303 255 z \n > 0000010 > > Let us try to delete a character and see if it worked: > > $ echo árvíz | tr -d á | od -c > 0000000 r v 255 z \n > 0000005 > > Correct expected behavior would rather be: > > $ echo árvíz | tr -d á | od -c > 0000000 r v 303 255 z \n > 0000006 > > I'll check the source for tr myself although never coded in C. > This should be a trivial fix. The problem is especially annoying > as we currently have no real simple and good general purpose case > conversion tool. (correct me if I'm wrong, but tr should be this > tool). > > Marton Kadar
