If the page is on your site, you can use CFDIRECTORY and the
DateLastModified attribute to get this information alot quicker.  I wouldn't
advise doing byte by byte comparisions of (potentially) large HTML files
with Cold Fusion.

-----Original Message-----
From: Larry W. Virden [mailto:[EMAIL PROTECTED]]
Sent: Thursday, November 16, 2000 5:02 PM
To: CF-Talk
Subject: Anyone have a guide to writing web scrapers?



I would like to write a CF app that looks to see if a web page has
updated, if so then fetchs data from a web page, gets
the useful info from the page, and writes it out perhaps as a csv formatted
file.

I'm certain that others have done this kind of thing.  What I don't know
is exactly how to get started.  Is there perhaps a tutorial around to walk
through doing this sort of thing?

Is the best approach to try to parse the html, or to grab plain text
and try to parse that?
--
Never apply a Star Trek solution to a Babylon 5 problem.
Larry W. Virden <mailto:[EMAIL PROTECTED]> <URL:
http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting should
be construed as representing my employer's opinions.
-><-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at
https://secure.houseoffusion.com

Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Structure your ColdFusion code with Fusebox. Get the official book at 
https://secure.houseoffusion.com

Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to