On 15 Oct 2007, at 3:40 PM, Ronald J Kimball wrote:

On Mon, Oct 15, 2007 at 11:58:12AM -0700, Greg V. Raven wrote:
OK, I'm stumped. I'm attempting to come up with a GREP pattern that
will find empty HTML tags. In building up to the full pattern, I've
found that this matches the start tag and the white space after it:

<([a-zA-Z]+) *.*?>\s*

The normal pattern for a closing tag seems to be:

</[a-zA-Z]+>

Given that I've captured the opening tag, it seems to me that the
pattern for the closing tag in my overall pattern should be:

<([a-zA-Z]+) ?.*?>\s*</\1>

However, while this pattern finds some empty tags, if I have nested
tags (empty or full), it finds the entire tag string, which is not
correct.

Any thoughts on what I'm missing?

Even though .*? is non-greedy, it can still match across a tag. I think
you want something like this instead:

<([a-zA-Z]+)[^>]*>\s*</\1>


This had confused me until I thought it through. Maybe the archives will benefit from how I thought it out (or I can benefit from someone clarifying it for me).

The following two regular expressions
.*?>
and
[^>]*>
are equivalent, as far as they go. The one matches zero-or-more characters, stopping the match at the pattern (>) following the repetition, and then matches >. The other matches zero-or-more characters that only aren't >, and then matches >.

The following two regular expressions
.*?></
and
[^>]*></
are not equivalent. The second does what you'd expect, matching zero- or-more non->s, then matching >, then <, then /.

The first one matches zero-or-more characters, NOT up to the next >, but UP TO THE NEXT ></. The whole remainder of the pattern has to match before it is determined which sequence of zero-or-more characters will be chosen as the prefix.

So
<([a-zA-Z]+) ?.*?>\s*</\1>
matches from the beginning of a tag, through the sequence >\s*</\1>, and it doesn't matter how many >s intervene between the tag and that suffix.

        — F


--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Reply via email to