It should take you all of 2 minutes to write one using cfhttp with the useragent attribute set to something like "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10"
This is a fragment of a bot I wrote to go to a manga site and download all of the images of a specific comic. <cfset useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10"> <cfset siteroot="http://www.onemanga.com"> <cfset Seriesname="Change_123"> <cfhttp url="#siteroot#/#Seriesname#/" useragent="#useragent#"> Oh, you'll also need some regex to identify urls and other navigation elements to populate the 'next' cfhttp calls. I usually use something like this: <cfset regex=structnew()> <cfset regex.pagetrim='^.+?<!-- Start of ''content'' container -->(.+?)<div id="footer">.+$'> <cfset page=rereplacenocase(cfhttp.filecontent, regex.pagetrim, '\1')> That particular regex trims out the stuff I don't need from the page so I can run other regex on it to get the content I do need. This makes the job faster as there is less to parse through, especially if your doing replaces. On Tue, Jul 7, 2009 at 3:40 PM, [email protected] [email protected]<[email protected]> wrote: > > At my job we have a secure website. Every hit to the site is captured by the > tracking system to the SQL Server database. > > We need to create an inventory system that can look at the data and tell us > about the assets on the site. > > To get the appropriate data into the database, we need to use a directory > crawler that can hit every asset and every item that it finds in the > directory structure. > > Is there such a crawler, that can appear to be a user-agent, that can crawl a > secure website? Is there such a crawler in ColdFusion? > > Any ideas or pointers to such a crawler would be very much appreciated. > > Thanks in advance, > > Jo-Anne Head > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Want to reach the ColdFusion community with something they want? Let them know on the House of Fusion mailing lists Archive: http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:324331 Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

