Re: Find
followed by many lines of arbitrary HTML through next
but exclude the second

ctfishman Tue, 14 Sep 2021 22:51:34 -0700

I tried doing this with just a regular expression but couldn't figure out 
how. I was however able to do it quite easily with a text filter. The 
following PERL example works for me to split the text and create and save 
an individual file for each chapter.


Save the following in your text filters folder and run it against your 
document.

--------------------------

#!/usr/bin/perl

# Read each line into a scaler, then print it back

my $fullstring;

while (<>) {
    $fullstring .= $_;
    print;
}

# split the scaler into an array

my @h2s = split( /<h2>/, $fullstring );

# Delete the first item of the array, which will be empty because our text 
starts with "<h2>"

shift @h2s;

# add back the "<h2>" at the start of each array element
# which was removed when we did the split

foreach $string (@h2s) {
    $string = "<h2>" . $string;
}

# Now the array contains each of your chapters, one per element.
# The following will create a new directory on your desktop called 
"Chapters"
# (if it doesn't exist already) and save a new document with the text from 
each
# chapter/array element. The original document will be the same as when it 
started,
#  because we printed each line back out after we read it.

my $counter = 1;

print `mkdir -p ~/Desktop/Chapters/`;

for (@h2s) {
    open( CHAPTER, ">~/Desktop/Chapters/chapter$counter.html" );
    print CHAPTER $_;
    close(CHAPTER);
    $counter++;
}

On Tuesday, September 14, 2021 at 6:17:42 PM UTC-4 sonic...@gmail.com wrote:

> My fiction writing workflow initially produces one HTML document with the 
> entire novel’s content. Each chapter starts with <h2>Exciting Chapter Title 
> Here</h2> then many paragraphs of story text with arbitrary HTML markup. I 
> split each chapter into its own HTML page, containing everything from that 
> first <h2> with the chapter title through the end of the chapter, which is 
> always immediately before the subsequent opening <h2> for the following 
> chapter (in the original un-split document).
>
> Working manually, i’ve been using the Grep Find:
>
> <h2>([\s\S]+?)<h2>
>
> This works perfectly, other than it includes the <h2> at the start of the 
> next chapter in the selection i’m about to cut or copy into a new HTML 
> document. I manually back off the selection to include everything found 
> minus that ending <h2>. I would like to better automate my workflow, but 
> can’t with the need for this manual adjustment.
>
> Re-reading the Grep help file with BBEdit, i thought lookahead might help. 
> I tried:
>
> <h2>([\s\S]+?)(?<h2>)
>
> but that just finds the first <h2> and one character immediately following 
> it. Noticing that BBEdit is highlighting the < for that second <h2>, i 
> tried escaping it:
>
> <h2>([\s\S]+?)(?\<h2>)
>
> This throws a PCRE error: unrecognized character after (? or (?- (12)
>
> Can anyone suggest a search string that will accomplish my goal?
>
> (BBEdit 11.6.8 running under macOS 10.12.6 Sierra.)
>
> Thanks!
>

-- 
This is the BBEdit Talk public discussion group. If you have a feature request 
or need technical support, please email "supp...@barebones.com" rather than 
posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
--- 
You received this message because you are subscribed to the Google Groups 
"BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bbedit+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bbedit/15847dcc-6be4-49c1-b6f3-54606099a5ffn%40googlegroups.com.

Re: Find followed by many lines of arbitrary HTML through next but exclude the second ctfishman Tue, 14 Sep 2021 22:51:34 -0700

followed by many lines of arbitrary HTML through next

but exclude the second

Reply via email to

Re: Find
followed by many lines of arbitrary HTML through next
but exclude the second

ctfishman Tue, 14 Sep 2021 22:51:34 -0700