On Mon, Feb 01, 2010 at 07:29:40AM EST, kj wrote: > How can I use grep to search for Chinese characters in a utf-8-encoded > file? Using grep naively (i.e. just putting the Chinese characters > where the search term normally goes in the command line) does not > work.
Works here: $ echo '䈀' > /tmp/U+4200.txt $ grep '䈀' /tmp/U+4200.txt 䈀 $ hd /tmp/U+4200.txt 00000000 e4 88 80 0a |....| 00000004 I did this on an xterm using scim's 'other->t-unicode' language option, where the character is entered by default via Ctrl-U and you type the code point in the popup. You could perhaps give more detail as to what "does not work" and maybe the operational context of your test, although I don't know what might be relevant.. what character(s), what terminal emulation, what shell, what version of grep, the exact command entered, what input method, what font, what OS, what distribution.. In other words a precise description of the scenario would be useful if someone is to try and recreate your problem. CJ