I'm trying to write a script with XML::RSS. Using an example I found on the
web, I've gotten it to work getting headlines and links from an html
document. But I am having difficulty understanding how to also get the lead
paragraphs for a $desc variable. I have looked at the man pages and searched
the web, but I may be missing something because my skills with perl go only
so far and I'm new to RSS. 

I've included my script so far below. Here are two examples of the text I am
working with, marked up to show where the headline, subhead and lead
paragraph are. As you will see, not every story contains a subhead, but
every story contains a headline and lead paragraph. Besides headline and
link, my wish is to use the subhead when possible, and the lead when there
is no subhead. 

Clearly, the time to look for this is when going through the html line by
line also looking for the headline and link, but I'm having trouble finding
examples of 1) how XML::RSS looks for descriptions and 2) how to include
them in $rss->add_item. Any help much appreciated.

Text example #1:

<p>
<a href="9864175.htm" class="digest-headline">HEADLINE: Girl dies from
apparent overdose</a><br />
<span class="digest-headline2">SUBHEAD: Family and friends say the
17-year-old, who was in a coma for 24 days, waged a battle against drug
addiction</span><br />

LEAD: Seventeen-year old Amberly Dana Gray of Bradenton died in a St.
Petersburg hospital late Wednesday after never regaining consciousness from
an apparent cocaine overdose, a relative said.<br>

Text example #2:

<p>
<a href="9864168.htm" class="digest-headline">HEADLINE: Probe: Cost-cutting
hurt youths' care</a><br />

LEAD: A probe into health care at a Miami boarding school for delinquents
has found that children sometimes received poor medical care because school
officials worried about costs.<br>

Script:

#!/usr/bin/perl -w

use XML::RSS;
use LWP::Simple;
use strict;

use vars qw($today $mday $wday $yday $isdst $length $mon $month $year $sec
$min $hour $pubDate $js);

my @pub = ('bradenton','centredaily');

foreach my $pub (@pub)  

{  # start foreach pub loop (#1)

        print "Processing publication: $pub....\n";
        my $url = "http://www."; . "$pub" . ".com/mld/" . "$pub" . "/rss/";

        print "Getting content from url: $url\n";
        my @lines = get "$url" or  die "get: $!";

        my $headlines_file = "temp/" . "$pub" . ".tmp";
        my $rss_output = "/usr/local/bin/binkrd/rss/" . "$pub" .
".headlines.rss";


        open(FILE,">$headlines_file") or die "Couldn't open $headlines_file:
$!";
                print FILE @lines;
        close FILE;

        # we create a new rss object.
        my $rss = new XML::RSS (version => '0.91');

        $rss->channel(title => "$pub" . ".com",
              link => "http://www."; . "$pub" . ".com/"
               );

        open(FILE, "<$headlines_file") or die "Couldn't open
$headlines_file: $!";


        while (<FILE>) {
           if ($_ =~ /<a href=\".*\" class=\"digest-headline\">.*<\/a><br /)
{

      my ($link, $headline) = /<a href=\"(.*)\"
class=\"digest-headline\">(.*)<\/a><br \/>/i;

      if ($link and $headline) {
         $link = "http://www."; . "$pub" . ".com/mld/" . $pub . "/rss/" .
"$link";
         $rss->add_item(title=>$headline, link=>$link );
      }


      next;
   }
}

        close(FILE);

        $rss->save($rss_output);

}
exit;


_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to