Why is my regex so slow?

Mark Wagner Fri, 31 Oct 2008 12:54:22 -0700

I've got a script I'm using to search through a list of Wikipedia
article titles to find ones that match certain patterns.


As-written, if you run it and supply '.*target.*' on standard input,
it will process my test file in 125 seconds.  Make any of the changes
mentioned in the comments, and the time needed will drop to 1.8
seconds.  Why the difference?  Particularly interesting is that it
seems to matter where the regex pattern came from: if it's from
standard input, testing is slow; if it's assigned in the script,
testing is fast.

If it matters, I'm using Perl 5.8.8.

To see the problem I'm having, download
http://download.wikimedia.org/eswiki/20081018/eswiki-20081018-all-titles-in-ns0.gz
(a 4.1-MB file), unzip it, and run the program supplying the name of
the unzipped file.

Thanks,
Mark Wagner

--------------
binmode STDIN, ":utf8"; # Comment this out to speed things up

while(<STDIN>)
{
        my $lines = 0;
        my $lines2 = 0;
        my $regex;
        $regex = $_;
        chomp $regex;

        #$regex = '.*target.*'; # Or uncomment this to speed things up
        open INFILE, "<", $ARGV[0];
        binmode INFILE, ":utf8"; # Or comment this out to speed things up

        while(<INFILE>)
        {
                my $target = $_;
                chomp $target;
                $target =~ s/_/ /g;

                print "Match\n" if($target =~ /^$regex$/); # Or make
this case-insensitive to speed things up, or remove the start and end
anchors to speed things up

                $lines = $lines + 1;
                if($lines >= 10000)
                {
                        $lines = 0;
                        $lines2 += 10000;
                        print STDERR "$lines2\r";
                }
        }
}

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Why is my regex so slow?

Reply via email to