On Thu, Jul 09, 2009 at 07:03:20PM -0400, Bernardo Rechea wrote:
> For reference, the brute force algorithm:
>
> #### Brute force
> T: foreach my $targetWord (@targetWords) {
> foreach my $searchWord (@searchWords) {
> if ($targetWord =~ /^$searchWord/) {
> push @foundWords, $targetWord;
> next T;
> }
> }
> }
> say $_ for @foundWords;
>
>
> $ time ./search_prefixWords.pl wn-tmp/data.srtu wn-tmp/data.srtu.5chars >
> out-brute.stdout
>
> real 21m54.758s
> user 21m44.346s
> sys 0m2.772s
Okay, I figured out why this code is so slow compared to mine. You're
iterating over the words and then the prefixes. Each time through the
inner loop, perl has to compile the regex: n*m regex compilations. If you
swap the loops, then perl compiles the regex for each prefix just once: m
regex compilations.
Original:
real 24m26.004s
user 24m13.461s
sys 0m3.160s
Swapped:
real 3m3.885s
user 3m3.342s
sys 0m0.280s
Note that swapping the loops means you may need to unique the list of found
words, but that's very fast to do. Alternatively, you could precompile the
regexes using qr//.
Ronald
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm