Detlef Lindenthal <[EMAIL PROTECTED]> wrote: > Or is it superfluous because usable tutorials already do exist? The regex (regular expression processing) in Perl comes directly from AWK, which indeed was one of the primary inspirations for Perl. AWK stands for Aho, Weinberger, and Kernighan. I learned Awk from "The AWK Book", a tiny volume, which I consider excellent. It was probably published in the early 90's, though possibly earlier. Should be easy to find. It has many practical examples. Testimony to how well Awk was designed is that the version of it embedded in Perl only differs by a few details here and there.
Once you have used the Awk book to get the basic ideas, you can read about the "s" and "m" operators in pod (just put an s in a MacPerl edit window, select it, and type cmd-H (for Help), and pod will display the information about s). Try to find some differences between the Awk and Perl ways of using s! They exist, but you may have a hard time finding them. Tony wrote: >>> I want to be able to grab a section of a string, starting at X and ending at >>> Y. Detlef Lindenthal wrote: >> ## Grab the amount like this: >> $text = "The amount of the house is one hundred thousand dollars, and I >> cannot afford that price."; $X = "amount of the house is"; $Y = ","; >> $text =~ m,$X(.*?)$Y,; >> print $1; ## This prints: " one hundred thousand dollars" Tony wrote: > I am not sure what the syntax is exactly, but it works great! (1) Unfortunately, this solution given here does not always give the right answer. For example, suppose there is MORE THAN ONE occurrence of $Y (a comma) in the sentence following $X ("The amount ... dollars"). Then the solution given here will include between $X and the FINAL $Y (final comma). (1) This problem was discussed at considerable length in this mailing list some months ago. I no longer remember the solution(s) given there to get the right answer, and I am not sufficiently expert to reconstruct it quickly. I don't know very well how to look things up in the archives of this mailing list, so I will leave that to someone else. (3) Detlef wrote: > $text contains your string > =~ means: apply some regex ("regular expression" = search pattern or search > and replace pattern) on it > m, ....... , means: what is between the two commas (or some other 2 > characters) shall be found (m stands for "match"). > $X and $Y are interpolated; that means you could as well write > $text =~ m'amount of the house is(.*?),'; > ... means: any character except \n in this case > ...* means: any count of those characters from zero to infinite > ? means: as few as possible (= nongreedy search) > (.*?) means: capture everything within these parens and return it named $1. Unfortunately some of this is stated so loosely that it is either subject to misinterpretation or possibly incorrect. Here is my view of a more correct version. $text contains your string. =~ means: apply some regex ("regular expression" = search pattern or search and replace pattern) on it. m/REGEX/ means: Match the regular expression between the two slashes (or some other 2 equal characters) to $text. .. means any character except \n ..* means any consecutive string (even the empty string) of any characters, excluding \n. ..+ means any non-empty consecutive string of any characters, excluding \n. In REGEX, expressions like $X are expanded before the matching starts. Now you can do the matching by $text =~ m/$X.*$Y/ The text matched by .* is what you want. But how are you going to refer to it? The answer is to put parentheses around any matching object you want to refer to: $text =~ m/$X(.*)$Y/ Now the first such expression is referred to as $1, the second as $2, etc. Parentheses also serve another purpose. You can put them around any legitimate expression and then put * or + or ? after. (REGEX)* matches any number from 0 up of consecutive REGEX-matching expressions. (REGEX)+ is the same except that there must be at least 1 such expression. (REGEX)? matches either 0 or 1 expressions that match REGEX This does not quite cover all the syntax you can use in Awk s, but there is only a little more, such as "|" for the OR operator.