On Mar 10, 2004, at 10:20 AM, Stuart White wrote:

Geez, I can't recall them covering (?: ) in my
books...D'oh!

It may not have. It's not super common to see it thrown about. Most people just use (...), I would guess.


The part about it grouping and
capturing things makes sense, as it's the "cousin" of
( ).  The part about being able to include the |'s
doesn't.  I found out, without knowing at the time,
that the parentheses breakdown with |'s.  I didn't
know it at the time, but when I put the ORs in the
parentheses and ran the program, I just got the
command prompt, no output.

Hmm. this still sounds a little confused. Let's us another example:


#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
print "\nLine: $_";
if (m/\[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)/) {
print "\tMatched: \\[([A-Z0-9 -]+)\\] (\\w+).+(?:Steal|Assist|Block|replaced by):? (\\w+)\n";
print "\t\t\$1 is $1\n\t\t\$2 is $2\n\t\t\$3 is $3\n";
}
if (m/\[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)/) {
print "\tMatched: \\[([A-Z0-9 -]+)\\] (\\w+).+(Steal|Assist|Block|replaced by):? (\\w+)\n";
print "\t\t\$1 is $1\n\t\t\$2 is $2\n\t\t\$3 is $3\n\t\t\$4 is $4\n";
}
}


__DATA__
(10:18) [PHX] Stoudemire Turnover: Lost Ball (1 TO) Steal: Jackson (1 ST)
(10:51) [SAN 4-0] Jackson Jump Shot: Made (2 PTS) Assist: Duncan (1 AST)
(9:33) [SAN] Duncan Layup Shot: Missed Block: Stoudemire (2 BLK)
(5:35) [SAN] Bowen Substitution replaced by Ginobili


When I run the above, I get:

Line: (10:18) [PHX] Stoudemire Turnover: Lost Ball (1 TO) Steal: Jackson (1 ST)
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is PHX
$2 is Stoudemire
$3 is Jackson
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is PHX
$2 is Stoudemire
$3 is Steal
$4 is Jackson


Line: (10:51) [SAN 4-0] Jackson Jump Shot: Made (2 PTS) Assist: Duncan (1 AST)
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN 4-0
$2 is Jackson
$3 is Duncan
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN 4-0
$2 is Jackson
$3 is Assist
$4 is Duncan


Line: (9:33) [SAN] Duncan Layup Shot: Missed Block: Stoudemire (2 BLK)
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Duncan
$3 is Stoudemire
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Duncan
$3 is Block
$4 is Stoudemire


Line: (5:35) [SAN] Bowen Substitution replaced by Ginobili
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Bowen
$3 is Ginobili
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Bowen
$3 is replaced by
$4 is Ginobili


Notice that they are nearly identical matches, I just changed the (?:...) to (...) in the second one. They function the same, the variables set by the expression is the only difference. (?:...) doesn't set a variable.

Your other confusion seems to be the | character. You seem to think it's a Perl or symbol. Not true. We're inside a regex here, gotta switch thinking. Regex knowledge in, Perl out. | is a regex alternation character, which pretty much means find this or this, as expected. That's probably why the symbol was chosen, looks like the or operators of many languages. However, note that & isn't significant in a regex.

Now, let's get to why | needs the (?:...) or (...) around it. If they weren't there, my regex would read like this:

Find
        \[([A-Z0-9 -]+)\] (\w+).+Steal
OR
        Assist
OR
        Block
OR
        replaced by:? (\w+)

Instead, it reads like this:

Find
        \[([A-Z0-9 -]+)\] (\w+).+
Followed By
                Steal
        OR
                Assist
        OR
                Block
        OR
                replaced by
Followed By
        :? (\w+)

As you can see, I need the parenthesis to keep the oring behavior of | from going to far.

Hopefully that makes sense.

You might take a trip back to the regex section of your books, if | is new to you. It's regex 101 and I would be super surprised if it isn't covered.

James


-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to