Regular Expression Quick Reference

Iain Truskett Sun, 20 Jul 2003 00:49:29 -0700

Is this of any use?

Attached, and also at:


    http://dellah.org/perlreref.pod
    http://dellah.org/perlreref.html

Amendments, suggestions, comments etc. are welcome, of course.


cheers,
-- 
Iain.

=head1 NAME

perlreref - Perl Regular Expressions Reference

=head1 DESCRIPTION

This is a quick reference to Perl's regular expressions.
For full information see L<perlre> and L<perlop>, as well
as the L<references|/"SEE ALSO"> section in this document.

=head1 OPERATORS

=over 4

=item =~

determines to which variable the regex is applied.
In its absence C<$_> is used.

        $var =~ /foo/;

=item m/pattern/igmsoxc

searches a string for a pattern match,
applying the given options.

        i  case Insensitive
        g  Global - all occurrences
        m  C<^> and C<$> match internal lines
        s  C<.> matches C<\n>
        o  Compile pattern Once
        x  eXtended - free whitespace and comments
        c  Don't reset pos on fails when using /g

If C<pattern> is an empty string, the last I<successfully> match regex
is used. Delimiters other than C</> may be used for both this operator
and the following ones.

=item qr/pattern/imsox

lets you store a regex in a variable,
or pass one around. Modifiers as for C<m//> and are stored
within the regex.

=item s/pattern/replacement/igmsoxe

substitutes matches of
C<pattern> with C<replacement>. Modifiers as for C<m//>
with addition of C<e>:

        e  Evaluate replacement as an expression

C<e> may be specified multiple times. C<replacement> is
interpreted as a double quoted string unless C<'> is used
as the delimiter.

=item ?pattern?

is like C<m/pattern/> but matches only once. No alternate
delimiters can be used. Must be reset with L<reset|perlfunc/"reset">.

=back

=head1 SYNTAX

   \     Escapes the character(s) immediately following it
   .     Matches any single character except a newline (unless /s is used)
   ^     Matches at the beginning of the string (or line, if /m is used)
   $     Matches at the end of the string (or line, if /m is used)
   *     Matches the preceding element 0 or more times
   +     Matches the preceding element 1 or more times
   ?     Matches the preceding element 0 or 1 times
   {...} Specifies a range of occurrences for the element preceding it
   [...] Matches any one of the characters contained within the brackets
   (...) Groups regular expressions
   |     Matches either the expression preceding or following it
   \1, \2 ...  The text from the Nth group

=head2 ESCAPE SEQUENCSE

These work as in normal strings.

   \a       Alarm (beep)
   \e       Escape
   \f       Formfeed
   \n       Newline
   \r       Carriage return
   \t       Tab
   \038     Any octal ASCII value
   \x7f     Any hexadecimal ASCII value
   \x{263a} A wide hexadecimal value
   \cx      Control-x
   \N{name} A named character

   \b  An assertion, not backspace, except in a character class.

   \l  Lowercase until next character
   \u  Uppercase until next character
   \L  Lowercase until \E
   \U  Uppercase until \E
   \Q  Disable pattern metacharacters until \E
   \E  End case modification

=head2 CHARACTER CLASSES

   [amy]    Match 'a', 'm' or 'y'.
   [f-j]    Dash specifies "range"
   [f-j-]   Dash escaped or at start or end means 'dash'.
   [^f-j]   Caret indicates "match char any _except_ these".

The following work within or without a character class:

   \d      A digit, same as [0-9]
   \D      A nondigit, same as [^0-9]
   \w      A word character (alphanumeric), same as [a-zA-Z_0-9]
   \W      A non-word character, [^a-zA-Z_0-9]
   \s      A whitespace character, same as [ \t\n\r\f]
   \S      A non-whitespace character, [^ \t\n\r\f]
   \C      Match a byte (with Unicode. '.' matches char)
   \pP     Match P-named (Unicode) property
   \p{...} Match Unicode property with long name
   \PP     Match non-P
   \P{...} Match lack of Unicode property with long name
   \X      Match extended unicode sequence

POSIX character classes and their Unicode and Perl equivalents:

   alnum   IsAlnum             Alphanumeric
   alpha   IsAlpha             Alphabetic
   ascii   IsASCII             Any ASCII char
   blank   IsSpace  [ \t]      Horizontal whitespace (GNU)
   cntrl   IsCntrl             Control characters
   digit   IsDigit  \d         Digits
   graph   IsGraph             Alphanumeric and punctuation
   lower   IsLower             Lower case chars (locale aware)
   print   IsPrint             Alphanumeric, punct, and space
   punct   IsPunct             Punctuation
   space   IsSpace  [\s\ck]    Whitespace
           IsSpacePerl   \s    Perl's whitespace definition
   upper   IsUpper             Upper case chars (locale aware)
   word    IsWord   \w         Alphanumeric plus _ (Perl)
   xdigit  IsXDigit [\dA-Fa-f] Hexadecimal digit

Within a character class:

    POSIX       traditional   Unicode
    [:digit:]       \d        \p{IsDigit}
    [:^digit:]      \D        \P{IsDigit}

=head2 ANCHORS

All are zero-width assertions.

   ^  Match string start (or line, if /m is used)
   $  Match string end (or line, if /m is used) or before newline
   \b Match word boundary (between \w and \W)
   \B Match except at word boundary
   \A Match string start (regardless of /m)
   \Z Match string end (preceding optional newline)
   \z Match absolute string end
   \G Match where previous m//g left off
   \c Suppresses resetting of search position when used with /g.
      Without \c, search pattern is reset to the beginning of the string

=head2 QUANTIFIERS

Quantifiers are greedy by default --- match the B<longest> leftmost.

   Maximal Minimal Allowed range
   ------- ------- -------------
   {n,m}   {n,m}?  Must occur at least n times but no more than m times
   {n,}    {n,}?   Must occur at least n times
   {n}     {n}?    Must match exactly n times
   *       *?      0 or more times (same as {0,})
   +       +?      1 or more times (same as {1,})
   ?       ??      0 or 1 time (same as {0,1})

=head2 EXTENDED CONSTRUCTS

   (?#text)         A comment
   (?:...)          Cluster without capturing.
   (?imxs-imsx:...) Enable/disable option (as per m//)
   (?=...)          Zero-width positive lookahead assertion.
   (?!...)          Zero-width negative lookahead assertion.
   (?<...)          Zero-width positive lookbehind assertion.
   (?<!...)         Zero-width negative lookbehind assertion.
   (?>...)          Grab what we can, prohibit backtracking.
   (?{ code })      Embedded code, return value becomes $^R.
   (??{ code })     Dynamic regex, return value used as regex.
   (?(cond)yes|no)  cond being int corresponding to capturing parens
   (?(cond)yes)     or a lookaround/eval zero-width assertion.

=head1 VARIABLES

   $_    Default variable for operators to use
   $*    Enable multiline matching (deprecated; not in 5.8.1+)

   $&    Entire matched string
   $`    Everything prior to matched string 
   $'    Everything after to matched string

The use of those last three will slow down B<all> regex use
within your program. Consult L<perlvar> for C<@LAST_MATCH_START>
to see equivalent expressions that won't cause slow down.
See also L<Devel::SawAmpersand>.

   $1, $2 ...  hold the Xth captured expr.
   $+    Last parenthesized pattern match
   $^N   Holds the most recently closed capture 
   $^R   Holds the result of the last (?{...}) expr
   @-    Offsets of starts of groups. [0] holds start of whole match
   @+    Offsets of ends of groups. [0] holds end of whole match

Capture groups are numbered according to their I<opening> paren.

=head1 FUNCTIONS

   lc          Lower case a string
   lcfirst     Lower case first char of a string
   uc          Upper case a string               
   ucfirst     Upper case first char of a string 
   pos         Return or set current match position
   quotemeta   Quote meta characters
   reset       Reset ?pattern? status
   study       Analyze string for optimizing matching

   split       Use regex to split a string into parts


=head1 AUTHOR

Iain Truskett.

This document may be distributed under the same terms as Perl itself.

=head1 SEE ALSO

=over 4

=item *

L<perlretut> for a tutorial on regular expressions.

=item *

L<perlrequick> for a rapid tutorial.

=item *

L<perlre> for more details.

=item *

L<perlvar> for details on the variables.

=item *

L<perlop> for details on the operators.

=item *

L<perlfunc> for details on the functions.

=item *

L<perlfaq6> for FAQs on regular expressions.

=item *

The L<re> module to alter behaviour.

=item *

L<perldebug/"Debugging regular expressions">

=item *

L<perluniintro>, L<perlunicode>, L<charnames> and L<locale>
for details on regexes and internationalisation.

=item *

I<Mastering Regular Expressions> by Jeffrey Friedl for a
thorough grounding and reference on the topic.

=back

Regular Expression Quick Reference

Reply via email to