> We'd like to pull meta tags from the home page of various websites.
>
> Here's how we'd like for this to work:
> 1. SQl table listing over 3,000 urls is queried.
> 2. Pull the meta tags and description from each of the home pages of these 
> websites.
> 3. Insert these meta tags into a database.
>
> What's the best way to accomplish this? In particular, how do we scrape the 
> meta tags using CF8?

The best way to accomplish this would probably be to use something
other than CF, which is not intended for this kind of thing. There are
all sorts of products, free and other, that can do individual parts of
this, without being tied to the request/response model that CF is
designed to work within.

If I had to do this, I think I'd use Python to query the database for
your list of URLs and write them to a file, then pass that file to
wget to fetch the URLs, then use Python again to parse the metadata
from the fetched URLs and write that to the database.

Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on
GSA Schedule, and provides the highest caliber vendor-authorized
instruction at our training centers, online, or onsite.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:331877
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm

Reply via email to