Kripa,
Well, the problem with your regex is the binding on the end and
beginning of each line.
Here's a simplified version of your regex, that produces identical results:
hf:~$ perl -lne 'print for m{^(\w\s*?)+\s*?\\?$}g' input.txt
c
e
Watch what happens when we remove the match for end of string:
hf:~$ perl -lne 'print for m{^(\w\s*?)+\s*?\\?}g' input.txt
a
d
Now watch what happens when we remove matching for each side:
hf:~$ perl -lne 'print for m{(\w\s*?)+\s*?\\?}g' input.txt
a
b
c
d
e
Why does this happen? Well, let's look at the one that binds only at
the beginning of the line. First the regex engine looks for the
beginning of the line, finds it,and looks for one or more groups of
(\w\s*?) immediately occurring thereafter. It finds one, captures it,
then skips over the rest. Then it looks for 0 or more whitespace (non
greedy), then a backslash, optionally, followed by the end of line.
In other words, it only captures the *one* instance over the match.
Because this is bound to the beginning of the line, it picks the
first. For the regex bound to the end of the line, it picks the last.
For the one bound to both sides, it picks the last one for some
reason. I don't know why that is.
I hope that answers your question; but to solve your problem, you
probably want:
hardasfuck:~$ perl -lne 'print for m/(\w)/g' input.txt
a
b
c
d
e
Keeping it simple, of course.
Thanks,
Jordan M. Adler
On Thu, Oct 25, 2012 at 3:57 PM, Kripa Sundar <[email protected]> wrote:
> Folks,
>
> I need help to reset my brain, w.r.t. an apparently-straightforward regex.
> I am sure I am missing something obvious here.
>
> I keep thinking that the regex below ought to get me all the words on a line
> except the backslash. Instead I am only getting the final word.
>
> (In my code, I threw out the regex, and resorted to split(). But my mental
> blind spot on this regex is bothering me.)
>
>> % cat > input.txt
>> a b c \
>> d e
>> %
>> % perl -lne 'print for m{ ^ (?: \s* (\w\S*) )+ \s* \\? $ }xg' !$
>> c
>> e
>> %
>
> peace, || Just the facts, ma'am:
> --{kr.pA} || http://www.washingtonpost.com/blogs/fact-checker/
> --
> Yoga: origami for people.
>
> _______________________________________________
> Boston-pm mailing list
> [email protected]
> http://mail.pm.org/mailman/listinfo/boston-pm
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm