> they are irrelevant and unavailable to this conversion exercise. I > could make them compliant and ignore them. But since this is more a > learning exercise then anything...
Fair enough on both counts. :) > The main features I do not grok is the what role the [^>] plays That's for if you don't know (or don't want to limit) what attributes a tag contains. A tag cannot contain a > character - any necessary ones would be escaped as > You could use a non-greedy wildcard like <tag.*?> but I use <tag[^>]*> as it is more precise. > how to interpret this part in front of the negative look behind; ?:[^/]/ Ah, you're mis-reading that slightly. There are a few parts at play here, I'll attempt to explain them individually in a simpler context... (note, I'm adding spaces purely for readability - pretend there are no spaces in any of the following examples) ( x | y ) is the standard "x OR y" - the parentheses are necessary to prevent the OR from applying to the whole of the expression. However, using parentheses means that regex will capture the contents for a backreference. This is not necessary here, so tell it to discard the contents, we put ?: inside the parens, so we get (?: x | y ) p.s. this also works without the OR operator - just as (?: x ) The first part of the OR is [^/] which means simply "not /" - putting caret (^) inside brackets negates them. e.g [^abc] means "a single character that is not a nor b nor c" Then, there's a negative lookahead which is (?! x ) and is the inverse of a regular lookahead - i.e. it makes sure the contents of the parens are NOT there. As with all lookarounds, it is zero-width - it matches only a position not actual characters. That is perhaps the key to understanding how they work - that no characters are ever consumed by a lookaround, but they still must match against the characters that follow the current position. Since we're dealing with a position, we need a preceeding character to actually proceed with the match For example x (?! y ) will match any x that is not followed a y (but it will match only the x and will continue checking the rest of the pattern from the next character). Since I mentioned the non-capturing (?: x ) above, I'll point out that this command is implicit in all lookarounds - they do not capture their contents for backreferences. So, to put all that together, what all this (?: [^/] | / (?! td> ) ) is actually saying is: Look for anything that is not a slash OR if you do find a slash only accept it if it is not followed by the characters "td>", and when you find either of these don't bother remembering it and just move on. Or, put simpler, "if you find /td> in this section then stop trying to match" Hopefully all of that makes sense? Feel free to ask if any part is unclear. :) -- Peter Boughton //hybridchill.com //blog.bpsite.net ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;203748912;27390454;j Archive: http://www.houseoffusion.com/groups/RegEx/message.cfm/messageid:1174 Subscription: http://www.houseoffusion.com/groups/RegEx/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.21
