On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote: [..]
> > $ set | grep ^L > > LANG=hu_HU.UTF-8 > > LC_ALL=hu_HU.UTF-8 > > LINES=73 > > LOGNAME=kadar1marto518 > > > > Now let's see the bytestream for the following string > > (which means flood in Hungarian): > > > > $ echo árvíz | od -c > > 0000000 303 241 r v 303 255 z \n > > 0000010 > > > > Let us try to delete a character and see if it worked: > > > > $ echo árvíz | tr -d á | od -c > > 0000000 r v 255 z \n > > 0000005 [..] Try this for size... $ echo árvíz | od -t x1z -w16 $ echo árvíz | tr -d é | od -t x1z -w16 $ echo árvíz | tr -d é > /tmp/u.txt $ isutf8 /tmp/u.txt And there is not even an ‘é’ in ‘árvíz’.. CJ P.S. Though you do have to look for it a bit, the coreutils manual clearly states that only single-byte encodings are supported: http://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.html -- Mooo Canada!!!!
