Re: [Boston.pm] Q: giant-but-simple regex efficiency

Uri Guttman Sat, 05 Feb 2011 12:41:22 -0800

>>>>> "MP" == Martyn Peck <[email protected]> writes:


  MP> What's wrong with something like this:

  MP>   while($line=<>){
  MP>       foreach my $name (@names){
  MP>           $line ~= s/$name/prefix_$1/g;
  MP>       }
  MP>   }

it is O( N^2 ) which is very slow for large data sets. 

  MP> I know it seems kind of brute force, looping through the same line 6000
  MP> times, but that's essentially what you where doing within the regex.

and it would even be slower your way since it is doing more perl ops and
that is the big bottleneck. staying in the regex engine is faster than
looping at the perl level.

  MP> My understanding is that regexs can do powerful and complex things, but
  MP> that that also makes them slow. And since this isn't actually all that
  MP> complex, most of the looping should be done in perl itself, and not in
  MP> the regex.

you have it backwards. most of your work is best done inside perl and
not with perl code. but this case is slightly different in that the
alternation is a slow regex thing because of the N^2 loop again. parsing
out the names (if possible) and checking them in a hash is O( N ). the
trie ideas can be O( N ) too if they work well.

uri

-- 
Uri Guttman  ------  [email protected]  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Q: giant-but-simple regex efficiency

Reply via email to