Re: grep & utf-8

Chris Jones Tue, 02 Feb 2010 03:44:05 -0800

On Mon, Feb 01, 2010 at 07:29:40AM EST, kj wrote:

> How can I use grep to search for Chinese characters in a utf-8-encoded
> file?  Using grep naively (i.e. just putting the Chinese characters
> where the search term normally goes in the command line) does not
> work.


Works here:

$ echo '䈀' > /tmp/U+4200.txt
$ grep '䈀' /tmp/U+4200.txt
䈀

$ hd /tmp/U+4200.txt
00000000  e4 88 80 0a                                       |....|
00000004

I did this on an xterm using scim's 'other->t-unicode' language option,
where the character is entered by default via Ctrl-U and you type the
code point in the popup.

You could perhaps give more detail as to what "does not work" and maybe
the operational context of your test, although I don't know what might
be relevant.. what character(s), what terminal emulation, what shell,
what version of grep, the exact command entered, what input method, what
font, what OS, what distribution.. 

In other words a precise description of the scenario would be useful if
someone is to try and recreate your problem.

CJ

Re: grep & utf-8

Reply via email to