> If you have time could you explain what each part of the regex is
> doing, so that I can learn for future?

Sure. :)


Starting with the original PHP one:
/(\s)(\w+)$/i

the /.../i means "a regular expression, using the i flag"
the i flag is "case-Insensitive"

the (\s) is "a group consisting of a single whiteSpace character
(space,tab,newline)"

the (\w+) is "a group consisting of one or more Word characters"
the + part of that means "one or more"

the $ at the end means "end of line" or "end of expression", depending
on if the expression is multi-line or not. It matches the position,
not a character (it is a zero-width match).


The slashes are used for PHP when you want to specify a flag - if you
had no flags, the slashes are not necessary.
Some languages (JavaScript) can work with non-quoted slashes, whilst
others (CFML) do not use the slash convention at all.

In all of these, there is an alternative way to specify flags, using
the construct (?i) - note that anything (?...) is a special group,
that acts differently to other groups. (note: this construct can be
used for a lot more than just flags - primarily lookarounds, but other
stuff too)

In CFML, you can use reReplaceNoCase instead of the i flag, which is
more readable for people that don't know regex.


In the replace string of " and $2" or " and \2" in the PHP/CFML ones,
the $2 or \2 is a backreference referring to group 2 - i.e. the second
pair of parentheses, containing the \w+ ("one or more word
characters")


In the simplified version, I removed the flag, removed the groups,
removed the space, and changed the replace string, so we ended up
with:

reReplace( wkstr , '\w+$','and \0' )

The flag was unnecessary - the \w means "any word character" and is
not case sensitive.

The \s was unnecessary because the \w+ will capture all of the
characters upto the space, and we were just putting the space back in,
so I took both the \s out and the space before the "and" out also.

With the \s gone, there was no need to group the characters and use \2
(or \1 as it would have become) - instead I used \0 in the replace
string, which means "the entire content of the match", rather than one
of the groups within it.


So, I think that covers everything - hopefully it all makes sense, and
wasn't information overload.
Let me know if you'd like any part clarified/re-worded. :)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to 
date
Get the Free Trial
http://ad.doubleclick.net/clk;207172674;29440083;f

Archive: http://www.houseoffusion.com/groups/regex/message.cfm/messageid:1201
Subscription: http://www.houseoffusion.com/groups/regex/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.21

Reply via email to