Re: regexp trouble

Jeff Pinyan Mon, 07 May 2001 06:34:10 -0700
On May 7, Johan Groth said:

>I want to strip a variable of all characters including a token, i.e.
>aaa_bbb_ccc_ddd would become bbb_ccc_ddd.  As you can see I want to get rid of
>aaa_. Does anyone know how to acomplish this in Perl?
>
>I've tried:
>$tmp = "aaa_bbb_ccc_ddd"
>$tmp =~ /^\w+_/
>$tmp = $';
>
>but that results in $tmp eq "ddd" instead of "bbb_ccc_ddd".

Please do not use $` $& and $'.  They cause slow-downs ($& not as badly,
but still) to the rest of your program's regular expressions.  Use,
instead, either the $1, $2, ... variables, or in your specific case, the
s/// operator.

  $tmp =~ s/^[^\W_]+_//;

As for why your regex fails, \w matches the '_' character.  Here's output
from 'explain':

======================================================================
The regular expression:

(?-imsx:^\w+_)

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  _                        '_'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
======================================================================


Using the modified regex presented above, we have [^\W_] to mean "any
character EXCEPT non-word characters and _" which is a nifty way of saying
"all word characters except _".

======================================================================
The regular expression:

(?-imsx:^[^\W_]+_)

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  [^\W_]+                  any character except: non-word characters
                           (all but a-z, A-Z, 0-9, _), '_' (1 or more
                           times (matching the most amount possible))
----------------------------------------------------------------------
  _                        '_'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
======================================================================


-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
Are you a Monk?  http://www.perlmonks.com/     http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc.     http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter.         Brother #734
Re: regexp trouble

Reply via email to