Re: grabbing text between two tokens

Chas. Owens Mon, 14 Apr 2008 18:08:50 -0700

On Mon, Apr 14, 2008 at 7:40 PM,  <[EMAIL PROTECTED]> wrote:
> Thanks Gary,
>  But some of us may wish to know why  $str =~ m/tokena(.*)tokenb/ms; did not
> get all the things between tokena and tokenb into $str?
>
>  This is how my mind think:-
>  (1) The /s switch means to be able to match across newlines.
>  (2) .* means to match zero or more of anything.
snip


Given a string "tokena foo bar tokenb tokena baz tokenb" the .* in the
regex will match " foo bar tokenb tokena baz " because it is greedy
(tries to match the longest string* possible) by default.  By adding
the ? quantifier modifier, you tell the quantifer, * in this case, to
match the shortest string** possible.  The resulting match will
therefore be "foo bar".  I suggest using Text::Balanced because it
makes extracting delimited text easier.  For instance, it could be
possible that the tags will be nested like this "tokena foo tokena bar
tokenb tokenb tokena baz tokenb".  In that case, neither of the two
regexes (greedy or non-greedy) will work.  You have to start doing
things like zero-width negative/positive look behinds/aheads.  I find
it easier to specify what I want to Text::Balanced and let it write
those regexes for me (I am Lazy***).

* that allows the pattern as a whole to match
** see above
*** one of the three virtues of a programmer

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: grabbing text between two tokens

Reply via email to