Thanks! -----Original Message----- From: $Bill Luebkert [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 06, 2004 8:15 PM To: Gary Nielson Cc: [EMAIL PROTECTED] Subject: Re: Question about parsing an html document
Gary Nielson wrote: > I am trying to get the first paragraph of an article from an html document. > I am trying to do this by getting the document from the web, using > 'join' to make many lines one line, and then trying to isolate the > text I want. Is this workable? > > Here's an example of the area of a longer html document that I am > trying to parse. (The dateline classes do not appear in all articles. > I figure I can get rid of the remaining tags later in the script.) > > </div> > <span class="body-content"><!-- begin body-content --> <p><b><span > class="dateline">SARASOTA</span><span > class="dateline-separator"> - </span></b>As the search for Carlie > Brucia intensified, hundreds of leads helped sketch the portrait of > the suspect who authorities say abducted the 11-year-old from a car > wash parking lot in February.</p> <p>According to the 615 pages of > tips and leads released by the State Attorney's Office on Tuesday > > Here's my script, which returns 'no match': > > use LWP::Simple; > my @lines = get( > "http://www.bradenton.com/mld/bradenton/rss/9837290.htm" ) or die $!; > > $line = join "", @lines if defined @lines; > if ($line =~ /<\!-- begin body-content -->(.*)\/p>/i) > { > print $1; > } else > { > print 'no match'; > } > Barring the use of an HTML parser, : use strict; use LWP::Simple; my @lines = get "http://www.bradenton.com/mld/bradenton/rss/9837290.htm" or die "get: $!"; my $line = join '', @lines; if ($line =~ /<!-- begin body-content -->(.*?)\/p>/is) { print "$1\n"; } else { print "No match\n"; } __END__ -- ,-/- __ _ _ $Bill Luebkert Mailto:[EMAIL PROTECTED] (_/ / ) // // DBE Collectibles Mailto:[EMAIL PROTECTED] / ) /--< o // // Castle of Medieval Myth & Magic http://www.todbe.com/ -/-' /___/_<_</_</_ http://dbecoll.tripod.com/ (My Perl/Lakers stuff) _______________________________________________ ActivePerl mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
