I've got a script I'm using to search through a list of Wikipedia article titles to find ones that match certain patterns.
As-written, if you run it and supply '.*target.*' on standard input, it will process my test file in 125 seconds. Make any of the changes mentioned in the comments, and the time needed will drop to 1.8 seconds. Why the difference? Particularly interesting is that it seems to matter where the regex pattern came from: if it's from standard input, testing is slow; if it's assigned in the script, testing is fast. If it matters, I'm using Perl 5.8.8. To see the problem I'm having, download http://download.wikimedia.org/eswiki/20081018/eswiki-20081018-all-titles-in-ns0.gz (a 4.1-MB file), unzip it, and run the program supplying the name of the unzipped file. Thanks, Mark Wagner -------------- binmode STDIN, ":utf8"; # Comment this out to speed things up while(<STDIN>) { my $lines = 0; my $lines2 = 0; my $regex; $regex = $_; chomp $regex; #$regex = '.*target.*'; # Or uncomment this to speed things up open INFILE, "<", $ARGV[0]; binmode INFILE, ":utf8"; # Or comment this out to speed things up while(<INFILE>) { my $target = $_; chomp $target; $target =~ s/_/ /g; print "Match\n" if($target =~ /^$regex$/); # Or make this case-insensitive to speed things up, or remove the start and end anchors to speed things up $lines = $lines + 1; if($lines >= 10000) { $lines = 0; $lines2 += 10000; print STDERR "$lines2\r"; } } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/