Titus von der Malsburg wrote: > When I do a simple grep on a 50Mb file with ~1.3 million lines, it > takes 2s on Linux (Ubuntu karmic with stock kernel, 2.6.31-17) and > ~12min on OSX (v. 10.5.8): > > grep '^[0-9]' < file.dat > /dev/null > > ~1.2 million lines actually begin with a number. Both systems run on > a Core 2 Duo CPU at 2.2 GHz and have 2GB of RAM. On both systems, I > use utf-8 encoding (en_US.UTF-8).
Thanks for the report. I can confirm that something is very wrong: In the C/POSIX locale, Fedora 12's grep is quick: (this is on a tmpfs file system) $ yes 123456789012345678901234567890|head -n1300000 > in $ env time grep '^[0-9]' in > /dev/null 0.15user 0.01system 0:00.16elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 200inputs+0outputs (1major+213minor)pagefaults 0swaps But in a UTF locale, it's incredibly slow: $ LC_ALL=en_GB.UTF-8 env time grep '^[0-9]' in > /dev/null I ran out of patience and interrupted the above after a couple of minutes.
