That is true and my apologies for not including a suitable
caveat in my prior post, though sometimes it can still be
helpful to start with a simple solution and work out from there. :-)
Regards,
-- Patrick
On 4/26/19 at 4:21 PM, [email protected] (Sam Hathaway) wrote:
I’m not sure this can be made to be reliable. Regular expressions can’t balance
tags, so:
```html
<div>
Leave me out!
<div class="main-content">
Include me!
<div>And me!</div>
And also me!
</div>
But not me.
</div>
```
Will result in:
```html
<div class="main-content">
Include me!
<div>And me!</div>
```
If you make the pattern greedy, you’ll get:
```html
<div class="main-content">
Include me!
<div>And me!</div>
And also me!
</div>
But not me.
</div>
```
Just fine if you don’t have any DIVs inside your main-content DIV, but how
likely is that?
Better off using a tool that’s designed to manipulate HTML.
Some options [here](https://superuser.com/questions/528709/command-line-css-selector-tool/528728).
It’d be lovely if BBEdit could allow find/replace based on
CSS selectors or XPath expressions in addition to text and
regexps. But presumably that would be a large undertaking.
Just my 2¢.
-sam
On 26 Apr 2019, at 16:11, Patrick Woolsey wrote:
On 4/26/19 at 3:07 PM, [email protected] (Phil Emery) wrote:
Ideally it would be great to delete all other content from
each existing file.
OK, thanks and in that case, you should be able to obtain the
desired outcome by performing a multi-file search & replace
with "Grep" enabled and patterns like these:
Find: \A(?s).+?(<div class="main-content">(?s).+?</div>)(?s).+
Replace: \1
and in short, here's how the patterns work:
The Find pattern begins by matching at the start of the
document \A and then _non-greedily_ matches ? one or more
instances of any character .+ _including_ line breaks
(achieved by pre-pending (?s) to the .) and followed by a
single _sub-pattern_, whose contents are enclosed in
parentheses ( ) and consist of the opening div followed by one
or more characters in another non-greedy match across lines
and then a closing div, and finally matching any characters
remaining in the document, including line breaks (?s).+
[NB: You'll need to adjust the exact form of the desired <div>
to suit
your content, i.e. depending whether these sections are
identified by
'class', 'name', or 'id'.]
The Replace pattern then reinserts only the contents of the
matched subpattern (consisting of the desired div, its
contents, and the closing div), thus effectively deleting
everything else.
As always, I recommend you try this procedure out on a few
sample files or a cloned copy before applying it to your
actual data, just to make sure it's doing what you
expect/want. :-)
Regards,
Patrick Woolsey
==
Bare Bones Software, Inc. <https://www.barebones.com/>
--
This is the BBEdit Talk public discussion group. If you have a
feature request or need technical support, please email
"[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>
--- You received this message because you are subscribed to
the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/bbedit.
--
This is the BBEdit Talk public discussion group. If you have a
feature request or need technical support, please email
"[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <https://www.twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/bbedit.