Hello all!
I want to clean up a film script in a bad html shape. I have replaced
nearly every thing, which has been formatted by a <pre> </pre>, many
white spaces and line breaks. Rest again the many actors texts which are
hanging between </p> and <p tags.
To give an example here the original:
...
</p>
Aaron is laying on the couch, looking drained. Ryan comes down
the stairs. It looks like he's had some time to recover. He
walks over and takes a seat.
<p class="mitte_bold">AARON
</p> How's Mom?
<p class="mitte_bold">RYAN
</p> She's resting.
A beat.
<p class="mitte_bold">
...
Which has to become something like:
***
</p>
<p class="links_normal">
Aaron is laying on the couch, looking drained. Ryan comes down
the stairs. It looks like he's had some time to recover. He
walks over and takes a seat.
</p>
<p class="mitte_bold">AARON
</p> How's Mom?
<p class="mitte_bold">RYAN
</p>
<p class="links_normal">She's resting.
A beat.
</p>
<p class="mitte_bold">
***
The format between the tags I don't mind. At the end BBEdit will care
about it.
I have set up a perl filter, but the problem is, that this filter is not
iterating over the many occurrences of </p>\s+[^<]+?<p
Could somebody help me out?
Thank you in advance! marek
#!/usr/bin/perl
use strict;
use warnings;
$/ = undef;
$_ = <>;
foreach ($_ =~ m,(</p>\s+[^<]+?<p),g) {
my $paragraf = $1;
$paragraf =~ s,</p>,$&<p class="links_normal">,;
$paragraf =~ s,\n\n<p$,</p><p,g;
$paragraf =~ s,\n,<br>,g;
$paragraf =~ s!\s{2,}!!g;
print $paragraf;
$paragraf = ();
}
print;
--
You received this message because you are subscribed to the
"BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
<http://groups.google.com/group/bbedit?hl=en>
If you have a feature request or would like to report a problem,
please email "[email protected]" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>