try this:
print "Match\n" if($target =~ /^$regex$/*o*); only compile regex only once On Fri, Oct 31, 2008 at 2:54 PM, Mark Wagner <[EMAIL PROTECTED]> wrote: > I've got a script I'm using to search through a list of Wikipedia > article titles to find ones that match certain patterns. > > As-written, if you run it and supply '.*target.*' on standard input, > it will process my test file in 125 seconds. Make any of the changes > mentioned in the comments, and the time needed will drop to 1.8 > seconds. Why the difference? Particularly interesting is that it > seems to matter where the regex pattern came from: if it's from > standard input, testing is slow; if it's assigned in the script, > testing is fast. > > If it matters, I'm using Perl 5.8.8. > > To see the problem I'm having, download > > http://download.wikimedia.org/eswiki/20081018/eswiki-20081018-all-titles-in-ns0.gz > (a 4.1-MB file), unzip it, and run the program supplying the name of > the unzipped file. > > Thanks, > Mark Wagner > > -------------- > binmode STDIN, ":utf8"; # Comment this out to speed things up > > while(<STDIN>) > { > my $lines = 0; > my $lines2 = 0; > my $regex; > $regex = $_; > chomp $regex; > > #$regex = '.*target.*'; # Or uncomment this to speed things up > open INFILE, "<", $ARGV[0]; > binmode INFILE, ":utf8"; # Or comment this out to speed things up > > while(<INFILE>) > { > my $target = $_; > chomp $target; > $target =~ s/_/ /g; > > print "Match\n" if($target =~ /^$regex$/); # Or make > this case-insensitive to speed things up, or remove the start and end > anchors to speed things up > > $lines = $lines + 1; > if($lines >= 10000) > { > $lines = 0; > $lines2 += 10000; > print STDERR "$lines2\r"; > } > } > } > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > http://learn.perl.org/ > > >