On Mon, Apr 14, 2008 at 7:40 PM, <[EMAIL PROTECTED]> wrote: > Thanks Gary, > But some of us may wish to know why $str =~ m/tokena(.*)tokenb/ms; did not > get all the things between tokena and tokenb into $str? > > This is how my mind think:- > (1) The /s switch means to be able to match across newlines. > (2) .* means to match zero or more of anything. snip
Given a string "tokena foo bar tokenb tokena baz tokenb" the .* in the regex will match " foo bar tokenb tokena baz " because it is greedy (tries to match the longest string* possible) by default. By adding the ? quantifier modifier, you tell the quantifer, * in this case, to match the shortest string** possible. The resulting match will therefore be "foo bar". I suggest using Text::Balanced because it makes extracting delimited text easier. For instance, it could be possible that the tags will be nested like this "tokena foo tokena bar tokenb tokenb tokena baz tokenb". In that case, neither of the two regexes (greedy or non-greedy) will work. You have to start doing things like zero-width negative/positive look behinds/aheads. I find it easier to specify what I want to Text::Balanced and let it write those regexes for me (I am Lazy***). * that allows the pattern as a whole to match ** see above *** one of the three virtues of a programmer -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/