I'm trying to write a script with XML::RSS. Using an example I found on the web, I've gotten it to work getting headlines and links from an html document. But I am having difficulty understanding how to also get the lead paragraphs for a $desc variable. I have looked at the man pages and searched the web, but I may be missing something because my skills with perl go only so far and I'm new to RSS.
I've included my script so far below. Here are two examples of the text I am working with, marked up to show where the headline, subhead and lead paragraph are. As you will see, not every story contains a subhead, but every story contains a headline and lead paragraph. Besides headline and link, my wish is to use the subhead when possible, and the lead when there is no subhead. Clearly, the time to look for this is when going through the html line by line also looking for the headline and link, but I'm having trouble finding examples of 1) how XML::RSS looks for descriptions and 2) how to include them in $rss->add_item. Any help much appreciated. Text example #1: <p> <a href="9864175.htm" class="digest-headline">HEADLINE: Girl dies from apparent overdose</a><br /> <span class="digest-headline2">SUBHEAD: Family and friends say the 17-year-old, who was in a coma for 24 days, waged a battle against drug addiction</span><br /> LEAD: Seventeen-year old Amberly Dana Gray of Bradenton died in a St. Petersburg hospital late Wednesday after never regaining consciousness from an apparent cocaine overdose, a relative said.<br> Text example #2: <p> <a href="9864168.htm" class="digest-headline">HEADLINE: Probe: Cost-cutting hurt youths' care</a><br /> LEAD: A probe into health care at a Miami boarding school for delinquents has found that children sometimes received poor medical care because school officials worried about costs.<br> Script: #!/usr/bin/perl -w use XML::RSS; use LWP::Simple; use strict; use vars qw($today $mday $wday $yday $isdst $length $mon $month $year $sec $min $hour $pubDate $js); my @pub = ('bradenton','centredaily'); foreach my $pub (@pub) { # start foreach pub loop (#1) print "Processing publication: $pub....\n"; my $url = "http://www." . "$pub" . ".com/mld/" . "$pub" . "/rss/"; print "Getting content from url: $url\n"; my @lines = get "$url" or die "get: $!"; my $headlines_file = "temp/" . "$pub" . ".tmp"; my $rss_output = "/usr/local/bin/binkrd/rss/" . "$pub" . ".headlines.rss"; open(FILE,">$headlines_file") or die "Couldn't open $headlines_file: $!"; print FILE @lines; close FILE; # we create a new rss object. my $rss = new XML::RSS (version => '0.91'); $rss->channel(title => "$pub" . ".com", link => "http://www." . "$pub" . ".com/" ); open(FILE, "<$headlines_file") or die "Couldn't open $headlines_file: $!"; while (<FILE>) { if ($_ =~ /<a href=\".*\" class=\"digest-headline\">.*<\/a><br /) { my ($link, $headline) = /<a href=\"(.*)\" class=\"digest-headline\">(.*)<\/a><br \/>/i; if ($link and $headline) { $link = "http://www." . "$pub" . ".com/mld/" . $pub . "/rss/" . "$link"; $rss->add_item(title=>$headline, link=>$link ); } next; } } close(FILE); $rss->save($rss_output); } exit; _______________________________________________ ActivePerl mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs