Re: reg exp speed?

Alan Campbell Sun, 21 May 2006 09:24:03 -0700

hello folks,
   
  Thanks for all the advice. As several of you suggested, the winning ticket 
turned out to be flipping to line-by-line regex from an array-slurped input i.e.
   
  # look for potentially problematic dissassembly of the following form: -
# 8004b980   003c34f4           STW.D2T1      A0,*B15--[1]
  my @dis_lines = <>;
  foreach my $ln (@dis_lines) {
      if ($ln =~ m/.*ST[A-Z].*B15--.*[13579]]/) {
         print $ln;
      }
   }
   
  I think I got carried away by last problem I had which *required* a 
multi-line match & scalar slurp without line-by-line was faster.
   
  Thanks again for all the advice. Much appreciated. FYI the script now runs in 
1sec on a 9Mb file, as compared to 3min 30s previously!
   
  cheers, Alan
  
"John W. Krahn" <[EMAIL PROTECTED]> wrote:
  Alan Campbell wrote:
> hello folks,


Hello,

> I'm slurping in a large file and seeing a nice speedup versus line by line
> processing...but I'm losing it in my (likely poorly constructed!)
> reg-expression match
> 
> I do: -
> #
> # look for potentially problematic code of the following form: -
> # STW b0, *SP--[3]
> # The reg exp tries to match: -
> # - anything up until 'ST' (so that we match STH, STW, STDW etc) followed by
> # - 1+ non-whitespace chars followed by
> # - 0+ whitespace chars followed by
> # - 0+ non-whitespace chars followed by
> # the string 'B15--' followed by
> # anything up until an odd single-digit number followed by
> # the ']' character
> # Matches all occurrences
> #
> my @match_sp = $all_lines =~ /.*ST\S+\s*\S*B15--.*[^02468]]/mg;
> 
> ...and then I foreach on @match_sp to show all occurrences found...
> 
> Any speedup tips most welcome. Would also appreciate a brief explanation of
> why this reg ex is slow (so I dont screw it up next time!)

Your pattern starts with '.*' and the '*' modifier is greedy so it has to
match as many non-newline characters as possible and then it back-tracks to
match 'ST'. You should use the non-greedy quantifier '*?' so it won't
back-track. Your character class [^02468] does indeed match odd numbers, as
well as every other character not '0', '2', '4', '6' or '8'. You should
probably use the character class [13579] to match _only_ odd numbers. The /m
option is not required because you are not using either '^' or '$' to anchor
the pattern.

my @match_sp = $all_lines =~ /.*?ST\S+\s*\S*B15--.*?\[[13579]]/g;



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





                
---------------------------------
How low will we go? Check out Yahoo! Messengers low  PC-to-Phone call rates.

Re: reg exp speed?

Reply via email to