Geez, I can't recall them covering (?: ) in my books...D'oh!
It may not have. It's not super common to see it thrown about. Most people just use (...), I would guess.
The part about it grouping and capturing things makes sense, as it's the "cousin" of ( ). The part about being able to include the |'s doesn't. I found out, without knowing at the time, that the parentheses breakdown with |'s. I didn't know it at the time, but when I put the ORs in the parentheses and ran the program, I just got the command prompt, no output.
Hmm. this still sounds a little confused. Let's us another example:
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
print "\nLine: $_";
if (m/\[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)/) {
print "\tMatched: \\[([A-Z0-9 -]+)\\] (\\w+).+(?:Steal|Assist|Block|replaced by):? (\\w+)\n";
print "\t\t\$1 is $1\n\t\t\$2 is $2\n\t\t\$3 is $3\n";
}
if (m/\[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)/) {
print "\tMatched: \\[([A-Z0-9 -]+)\\] (\\w+).+(Steal|Assist|Block|replaced by):? (\\w+)\n";
print "\t\t\$1 is $1\n\t\t\$2 is $2\n\t\t\$3 is $3\n\t\t\$4 is $4\n";
}
}
__DATA__
(10:18) [PHX] Stoudemire Turnover: Lost Ball (1 TO) Steal: Jackson (1 ST)
(10:51) [SAN 4-0] Jackson Jump Shot: Made (2 PTS) Assist: Duncan (1 AST)
(9:33) [SAN] Duncan Layup Shot: Missed Block: Stoudemire (2 BLK)
(5:35) [SAN] Bowen Substitution replaced by Ginobili
When I run the above, I get:
Line: (10:18) [PHX] Stoudemire Turnover: Lost Ball (1 TO) Steal: Jackson (1 ST)
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is PHX
$2 is Stoudemire
$3 is Jackson
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is PHX
$2 is Stoudemire
$3 is Steal
$4 is Jackson
Line: (10:51) [SAN 4-0] Jackson Jump Shot: Made (2 PTS) Assist: Duncan (1 AST)
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN 4-0
$2 is Jackson
$3 is Duncan
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN 4-0
$2 is Jackson
$3 is Assist
$4 is Duncan
Line: (9:33) [SAN] Duncan Layup Shot: Missed Block: Stoudemire (2 BLK)
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Duncan
$3 is Stoudemire
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Duncan
$3 is Block
$4 is Stoudemire
Line: (5:35) [SAN] Bowen Substitution replaced by Ginobili
Matched: \[([A-Z0-9 -]+)\] (\w+).+(?:Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Bowen
$3 is Ginobili
Matched: \[([A-Z0-9 -]+)\] (\w+).+(Steal|Assist|Block|replaced by):? (\w+)
$1 is SAN
$2 is Bowen
$3 is replaced by
$4 is Ginobili
Notice that they are nearly identical matches, I just changed the (?:...) to (...) in the second one. They function the same, the variables set by the expression is the only difference. (?:...) doesn't set a variable.
Your other confusion seems to be the | character. You seem to think it's a Perl or symbol. Not true. We're inside a regex here, gotta switch thinking. Regex knowledge in, Perl out. | is a regex alternation character, which pretty much means find this or this, as expected. That's probably why the symbol was chosen, looks like the or operators of many languages. However, note that & isn't significant in a regex.
Now, let's get to why | needs the (?:...) or (...) around it. If they weren't there, my regex would read like this:
Find \[([A-Z0-9 -]+)\] (\w+).+Steal OR Assist OR Block OR replaced by:? (\w+)
Instead, it reads like this:
Find \[([A-Z0-9 -]+)\] (\w+).+ Followed By Steal OR Assist OR Block OR replaced by Followed By :? (\w+)
As you can see, I need the parenthesis to keep the oring behavior of | from going to far.
Hopefully that makes sense.
You might take a trip back to the regex section of your books, if | is new to you. It's regex 101 and I would be super surprised if it isn't covered.
James
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>