On 02/06/2010 06:25 PM, Titus von der Malsburg wrote:
On Sat, Feb 6, 2010 at 4:54 PM, Paolo Bonzini<[email protected]> wrote:
Can you try with 2.5.4 on Linux, but configuring with --with-included-regex?
Then it takes 19min. When I do the same on OSX it takes again 12 min.
That's more or less the same ratio for your hand-crafted program, so it
means it is the same performance apart from the CPU speed.
Does this mean that the Berkley implementation of regex on OSX is
crappy and that grep therefore uses it's own engine? Is there a way
to build grep with the GNU engine on OSX?
It means we identified the culprit (the regex engine; possibly the
strcoll/wcscoll calls used to handle [0-9]), but it is still strange
because, AFAICT, the regex code should not even be invoked unless you
use grep -o or --color or similar options. Instead, grep should use its
own DFA implementation.
Running under LC_ALL=C should mitigate or eliminate the bad performance.
Paolo