Detlef Lindenthal <[EMAIL PROTECTED]> wrote:
> Or is it superfluous because usable tutorials already do exist?
The regex (regular expression processing) in Perl comes directly from AWK,
which indeed was one of the primary inspirations for Perl.  AWK stands for
Aho, Weinberger, and Kernighan. I learned Awk from "The AWK Book", a tiny
volume, which I consider excellent. It was probably published in the early
90's, though possibly earlier. Should be easy to find. It has many practical
examples. Testimony to how well Awk was designed is that the version of it
embedded in Perl only differs by a few details here and there.

Once you have used the Awk book to get the basic ideas, you can read about
the "s" and "m" operators in pod (just put an s in a MacPerl edit window,
select it, and type cmd-H (for Help), and pod will display the information
about s). Try to find some differences between the Awk and Perl ways of
using s! They exist, but you may have a hard time finding them.


Tony wrote:
>>> I want to be able to grab a section of a string, starting at X and ending at
>>> Y.

Detlef Lindenthal wrote:
>> ##  Grab the amount like this:

>> $text = "The amount of the house is one hundred thousand dollars, and I
>> cannot afford that price."; $X = "amount of the house is"; $Y = ",";

>> $text =~ m,$X(.*?)$Y,;
>> print $1;  ## This prints: " one hundred thousand dollars"


Tony wrote:
> I am not sure what the syntax is exactly, but it works great!

(1) Unfortunately, this solution given here does not always give the right
answer.  For example, suppose there is MORE THAN ONE occurrence of $Y (a
comma) in the sentence following $X ("The amount ... dollars"). Then the
solution given here will include between $X and the FINAL $Y (final comma).

(1) This problem was discussed at considerable length in this mailing list
some months ago. I no longer remember the solution(s) given there to get the
right answer, and I am not sufficiently expert to reconstruct it quickly. I
don't know very well how to look things up in the archives of this mailing
list, so I will leave that to someone else.

(3) Detlef wrote:
> $text   contains your string
> =~   means: apply some regex ("regular expression" = search pattern or search
> and replace pattern) on it
> m, ....... ,   means: what is between the two commas (or some other 2
> characters) shall be found (m stands for "match").
> $X and $Y are interpolated; that means you could as well write
> $text =~ m'amount of the house is(.*?),';
> ...  means: any character except \n in this case
> ...* means: any count of those characters from zero to infinite
> ? means: as few as possible (= nongreedy search)
> (.*?) means: capture everything within these parens and return it named $1.

Unfortunately some of this is stated so loosely that it is either subject to
misinterpretation or possibly incorrect.  Here is my view of a more correct
version.

$text   contains your string.
=~   means: apply some regex ("regular expression" = search pattern or
search and replace pattern) on it.
m/REGEX/  means: Match the regular expression between the two slashes (or
some other 2 equal characters) to $text.
.. means any character except \n
..* means any consecutive string (even the empty string) of any characters,
excluding \n.
..+ means any non-empty consecutive string of any characters, excluding \n.
In REGEX, expressions like $X are expanded before the matching starts.
Now you can do the matching by
    $text =~ m/$X.*$Y/
The text matched by .* is what you want. But how are you going to refer to
it?
The answer is to put parentheses around any matching object you want to
refer to:  $text =~ m/$X(.*)$Y/
Now the first such expression is referred to as $1, the second as $2, etc.
Parentheses also  serve another purpose. You can put them around any
legitimate expression and then put * or  + or ? after.
(REGEX)* matches any number from 0 up of consecutive REGEX-matching
expressions.
(REGEX)+ is the same except that there must be at least 1 such expression.
(REGEX)? matches either 0 or 1 expressions that match REGEX
This does not quite cover all the syntax you can use in Awk s, but there is
only a little more, such as "|" for the OR operator.


Reply via email to