Re: Delete Table Column via grep

Warren Michelsen Mon, 29 Jan 2007 09:03:36 -0800

At 11:25 AM -0500 1/29/07, Ronald J Kimball sent email regarding Re:Delete Table Column via grep:

On Mon, Jan 29, 2007 at 08:19:25AM -0700, Warren Michelsen wrote:

I'm trying to use a grep find to eliminate the last column in atable. I'm not able to get grep to find the last instance of<td.*</td> before a </tr>.
 <td.*
 will find the opening tag
 and
 .*</tr>  will find just the row closing tag but

 <td.*</td>
 will not find the entire column.

 I'd thought I could find something like:
 <td.*</td>.*</tr>
 and replace it with </tr>


I've ended up writing a long explanation.  If you just want to see my
suggestion for a regex that should work, skip to the bottom.  :)

The pattern supplied worked splendidly, and I much appreciate thedetailed explanation. I hope that one day I would not have to asksuch questions.



So, the first thing you need to know is that . matches any character except
a newline.  It sounds like your tags are on separate lines, so you need to
allow . to match newlines as well.  Use (?s) for this.  The 's' is
for single line; it treats the file like it's a single line, so newlines
are just regular characters.  That gives us this regex:
  (?s)<td.*</td>.*</tr>

Is it possible, as an alternate strategy, to specify that between <tdand </tr> there may be multiple lines? Treating the entire line asone long line seems to complicate things. Is there no way to specifyany character at all, including newlines?


The next thing you need to know is that a regular expression will
find the longest, left-most match.  (?s)<td.*</td>.*</tr> finds the first
<td in the file, then the last </td> in the file, then the last </tr> in
the file.  That's not right, so maybe we want non-greedy quantifiers.  Now
we've got:
  (?s)<td.*?</td>.*?</tr>


So a '?' is the non-greedy specifier?


When I said that a regular expression will find the longest, left-most
match, that was a generalization.  "Longest" actually means
"least-backtracking".  For the greedy quantifiers (? + * {n,m}) those are
the same.  For the non-greedy quantifiers (?? +? *? {n,m}?) it's the
shortest, left-most match instead.

Unfortunately, in either case it's still the leftmost match.
(?s)<td.*?</td>.*?</tr> finds the first <td in the file, then the first
</td> after that, then the first </tr> after that.  So if you've got
multiple <td></td>s, it will match from the first <td> all the way through
to the </tr>.  That's not what we want either.

Too bad there's no way to tell grep to work backwards from a foundpoint, i.e., find </tr> and select back to the first <td encountered.



This regex should do what you want:

(?s)<td(?:(?!</?td).)*</td>(?:(?!</?td).)*</tr>

Worked just fine. The only drawback is that my Eudora turned variouscharacters of the pattern into sad smileys. Copied and pasted justfine though.

Just out of curiosity, suppose I wanted to eliminate column N insteadof the last one. Is grep up to the task? (I'm certainly not.) Suchthings as this would be nice to script:Select a table, choose the "Delete Table Column" script from theScripts menu, respond with the number of the column to delete andexecute.

If that can be done, then it ought to be possible to add a column aswell, at column position N, etc.


Thanks muchly for the explanation and pattern!



--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Re: Delete Table Column via grep

Reply via email to