Thanks! 

-----Original Message-----
From: $Bill Luebkert [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 06, 2004 8:15 PM
To: Gary Nielson
Cc: [EMAIL PROTECTED]
Subject: Re: Question about parsing an html document

Gary Nielson wrote:

> I am trying to get the first paragraph of an article from an html
document.
> I am trying to do this by getting the document from the web, using 
> 'join' to make many lines one line, and then trying to isolate the 
> text I want. Is this workable?
> 
> Here's an example of the area of a longer html document that I am 
> trying to parse. (The dateline classes do not appear in all articles. 
> I figure I can get rid of the remaining tags later in the script.)
> 
> </div>
> <span class="body-content"><!-- begin body-content --> <p><b><span 
> class="dateline">SARASOTA</span><span
> class="dateline-separator"> - </span></b>As the search for Carlie 
> Brucia intensified, hundreds of leads helped sketch the portrait of 
> the suspect who authorities say abducted the 11-year-old from a car 
> wash parking lot in February.</p> <p>According to the 615 pages of 
> tips and leads released by the State Attorney's Office on Tuesday
> 
> Here's my script, which returns 'no match':
> 
> use LWP::Simple;
> my @lines = get( 
> "http://www.bradenton.com/mld/bradenton/rss/9837290.htm"; ) or die $!;
> 
>     $line = join "", @lines if defined @lines;
>     if ($line =~ /<\!-- begin body-content -->(.*)\/p>/i)
>         {
>         print $1;
>         } else
>                 {
>                 print 'no match';
>                 }
> 

Barring the use of an HTML parser, :

use strict;
use LWP::Simple;

my @lines = get "http://www.bradenton.com/mld/bradenton/rss/9837290.htm"; or
  die "get: $!";

my $line = join '', @lines;
if ($line =~ /<!-- begin body-content -->(.*?)\/p>/is) {
        print "$1\n";
} else {
        print "No match\n";
}

__END__


-- 
  ,-/-  __      _  _         $Bill Luebkert    Mailto:[EMAIL PROTECTED]
 (_/   /  )    // //       DBE Collectibles    Mailto:[EMAIL PROTECTED]
  / ) /--<  o // //      Castle of Medieval Myth & Magic
http://www.todbe.com/
-/-' /___/_<_</_</_    http://dbecoll.tripod.com/ (My Perl/Lakers stuff)


_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to