Just an FYI:
Note that the get can write directly to a scalar.
$req = HTTP::Request->new(GET => 'http://www.bradenton.com/mld/bradenton/rss/9837290.htm');
Eliminating the need for the join command.

Basil



"$Bill Luebkert" <[EMAIL PROTECTED]>
Sent by: [EMAIL PROTECTED]

06/10/2004 07:15 PM

To
Gary Nielson <[EMAIL PROTECTED]>
cc
[EMAIL PROTECTED]
Subject
Re: Question about parsing an html document





Gary Nielson wrote:

> I am trying to get the first paragraph of an article from an html document.
> I am trying to do this by getting the document from the web, using 'join' to
> make many lines one line, and then trying to isolate the text I want. Is
> this workable?
>
> Here's an example of the area of a longer html document that I am trying to
> parse. (The dateline classes do not appear in all articles. I figure I can
> get rid of the remaining tags later in the script.)
>
> </div>
> <span class="body-content"><!-- begin body-content -->
> <p><b><span class="dateline">SARASOTA</span><span
> class="dateline-separator"> - </span></b>As the search for Carlie Brucia
> intensified, hundreds of leads helped sketch the portrait of the suspect who
> authorities say abducted the 11-year-old from a car wash parking lot in
> February.</p>
> <p>According to the 615 pages of tips and leads released by the State
> Attorney's Office on Tuesday
>
> Here's my script, which returns 'no match':
>
> use LWP::Simple;
> my @lines = get( "http://www.bradenton.com/mld/bradenton/rss/9837290.htm" )
> or die $!;
>
>     $line = join "", @lines if defined @lines;
>     if ($line =~ /<\!-- begin body-content -->(.*)\/p>/i)
>         {
>         print $1;
>         } else
>                 {
>                 print 'no match';
>                 }
>

Barring the use of an HTML parser, :

use strict;
use LWP::Simple;

my @lines = get "http://www.bradenton.com/mld/bradenton/rss/9837290.htm" or
 die "get: $!";

my $line = join '', @lines;
if ($line =~ /<!-- begin body-content -->(.*?)\/p>/is) {
                print "$1\n";
} else {
                print "No match\n";
}

__END__


--
 ,-/-  __      _  _         $Bill Luebkert    Mailto:[EMAIL PROTECTED]
(_/   /  )    // //       DBE Collectibles    Mailto:[EMAIL PROTECTED]
 / ) /--<  o // //      Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_</_</_    http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to