Is there a way to rip the HTML tags with multi-line HTML tags?
>From: "Sascha Kersken" <[EMAIL PROTECTED]>
>To: Sparkle Williams <[EMAIL PROTECTED]>
>CC: [EMAIL PROTECTED]
>Subject: Re: perl and internet files
>Date: Thu, 19 Jul 2001 16:07:01 +0200
>
>Hi!
>
>There's the wonderful recipe 20.5 ("Converting HTML to ASCII") in Chapter
>20
>("Web Automation") of the "Perl Cookbook" (by Tom Christiansen and Nathan
>Torkington, from O'Reilly).
>
>A basic way to achieve the ripping of HTML tags and the replacement of <br>
>and <p> tags by line breaks might be something like:
>
>#!/usr/bin/perl -w
>
>open HTMLFILE, "<the_html_file's_name" || die "Can't open that: $!";
>while (<HTMLFILE>)
>{
> chomp;
> s/<p[^>]+>/\n\n/gi;
> s/<br[^>]+>/\n/gi;
> s/<[^>]+>//g;
> print;
>}
>
>- but be careful! This won't work when there are multi-line HTML tags!
>
>
>Sascha
>
>----------
> >Von: "Sparkle Williams" <[EMAIL PROTECTED]>
> >An: [EMAIL PROTECTED]
> >Betreff: perl and internet files
> >Datum: Don, 19. Jul 2001 15:51 Uhr
> >
>
> > Good morning!
> > I just wrote a perl program that retrieves files of type http:// and
>ftp://
> > from the internet. When it retrieves the files it
> > comes up in the html syntax of head, body, text etc. Is there any way I
>can
> > write an addition to my script that will cause
> > the text to come up in it's formatted form rather than the html syntax
> > describing it's format?
> >
> > _________________________________________________________________
> > Descargue GRATUITAMENTE MSN Explorer en http://explorer.msn.es/intl.asp
> >
> >
> > --
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>--
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
_________________________________________________________________
Descargue GRATUITAMENTE MSN Explorer en http://explorer.msn.es/intl.asp
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]