At 11:25 AM -0500 1/29/07, Ronald J Kimball sent email regarding Re:
Delete Table Column via grep:
On Mon, Jan 29, 2007 at 08:19:25AM -0700, Warren Michelsen wrote:
I'm trying to use a grep find to eliminate the last column in a
table. I'm not able to get grep to find the last instance of
<td.*</td> before a </tr>.
<td.*
will find the opening tag
and
.*</tr> will find just the row closing tag but
<td.*</td>
will not find the entire column.
I'd thought I could find something like:
<td.*</td>.*</tr>
and replace it with </tr>
I've ended up writing a long explanation. If you just want to see my
suggestion for a regex that should work, skip to the bottom. :)
The pattern supplied worked splendidly, and I much appreciate the
detailed explanation. I hope that one day I would not have to ask
such questions.
So, the first thing you need to know is that . matches any character except
a newline. It sounds like your tags are on separate lines, so you need to
allow . to match newlines as well. Use (?s) for this. The 's' is
for single line; it treats the file like it's a single line, so newlines
are just regular characters. That gives us this regex:
(?s)<td.*</td>.*</tr>
Is it possible, as an alternate strategy, to specify that between <td
and </tr> there may be multiple lines? Treating the entire line as
one long line seems to complicate things. Is there no way to specify
any character at all, including newlines?
The next thing you need to know is that a regular expression will
find the longest, left-most match. (?s)<td.*</td>.*</tr> finds the first
<td in the file, then the last </td> in the file, then the last </tr> in
the file. That's not right, so maybe we want non-greedy quantifiers. Now
we've got:
(?s)<td.*?</td>.*?</tr>
So a '?' is the non-greedy specifier?
When I said that a regular expression will find the longest, left-most
match, that was a generalization. "Longest" actually means
"least-backtracking". For the greedy quantifiers (? + * {n,m}) those are
the same. For the non-greedy quantifiers (?? +? *? {n,m}?) it's the
shortest, left-most match instead.
Unfortunately, in either case it's still the leftmost match.
(?s)<td.*?</td>.*?</tr> finds the first <td in the file, then the first
</td> after that, then the first </tr> after that. So if you've got
multiple <td></td>s, it will match from the first <td> all the way through
to the </tr>. That's not what we want either.
Too bad there's no way to tell grep to work backwards from a found
point, i.e., find </tr> and select back to the first <td encountered.
This regex should do what you want:
(?s)<td(?:(?!</?td).)*</td>(?:(?!</?td).)*</tr>
Worked just fine. The only drawback is that my Eudora turned various
characters of the pattern into sad smileys. Copied and pasted just
fine though.
Just out of curiosity, suppose I wanted to eliminate column N instead
of the last one. Is grep up to the task? (I'm certainly not.) Such
things as this would be nice to script:
Select a table, choose the "Delete Table Column" script from the
Scripts menu, respond with the number of the column to delete and
execute.
If that can be done, then it ought to be possible to add a column as
well, at column position N, etc.
Thanks muchly for the explanation and pattern!
--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to: <[EMAIL PROTECTED]>