> We'd like to pull meta tags from the home page of various websites. > > Here's how we'd like for this to work: > 1. SQl table listing over 3,000 urls is queried. > 2. Pull the meta tags and description from each of the home pages of these > websites. > 3. Insert these meta tags into a database. > > What's the best way to accomplish this? In particular, how do we scrape the > meta tags using CF8?
The best way to accomplish this would probably be to use something other than CF, which is not intended for this kind of thing. There are all sorts of products, free and other, that can do individual parts of this, without being tied to the request/response model that CF is designed to work within. If I had to do this, I think I'd use Python to query the database for your list of URLs and write them to a file, then pass that file to wget to fetch the URLs, then use Python again to parse the metadata from the fetched URLs and write that to the database. Dave Watts, CTO, Fig Leaf Software http://www.figleaf.com/ http://training.figleaf.com/ Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on GSA Schedule, and provides the highest caliber vendor-authorized instruction at our training centers, online, or onsite. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:331877 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm

