On Wed, 04 Jun 2003 23:07:08 +0200
"H.J.Bathoorn" <[EMAIL PROTECTED]> wrote:

> On Tuesday 03 June 2003 07:12, Todd Slater wrote:
> > http://clevername.homeip.net/gnews2
> >
> > I've tested it out a little and it seems to work. If no new
> > headlines are available, it just says "No new headlines" in the
> > email. Note that it still has to pull all the pages from google.
> >
> > Put it in cron and run every hour? Let me know of any bugs, er,
> > features!
> >
> > Todd
> 
> running it!.....looks fine no probs (except I had to install &
> configure sendmail).
> 
> Wouldn't it be an idea to show the first 5 or 10 lines from the
> extracted URL so one gets a better discription of the article itself?
> 
> Good luck,
> HarM

That would be nice. I thought about trying to include the blurb from
google but decided against it, especially with the version that only
sends new headlines. The reason is that it
1. pulls the page(s) from google
2. strips out lines in the html with headlines (and writes to a file)
3. strips out the headline text and writes that to a file
4. strips out the url and writes that to a file
5. greps the old headline and url file for matches
6. if no match is found, writes that to a new file (one each for
headline and url)
7. pastes new headline and new urls together
8. puts headlines and urls for each category in a mail

It is beyond my skill right now to throw in extra text and guarantee
that everything matches up like it does now :).

As far as actually visiting the urls of the news sources, that would
take quite a bit to have wget retrieve each page then try to filter out
just the first few lines of text of the article.

Today's the first day I really ran it, and I only retrieve 2 headlines
for business, health, entertainment, and sports, and 5 for usa and
world, and all for technology. Running it every 2-3 hours gave me new
headlines in each category every time.

BTW, you can remove or comment out the lines about "Old hl exists". That
was just for debugging and I forgot to take it out of the script. Should
save you from getting mail output from cron.

Todd

Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Reply via email to