It should take you all of 2 minutes to write one using cfhttp with the
useragent attribute set to something like "Mozilla/5.0 (Windows; U;
Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10"

This is a fragment of a bot I wrote to go to a manga site and download
all of the images of a specific comic.
<cfset useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10">
<cfset siteroot="http://www.onemanga.com";>
<cfset Seriesname="Change_123">
<cfhttp url="#siteroot#/#Seriesname#/" useragent="#useragent#">

Oh, you'll also need some regex to identify urls and other navigation
elements to populate the 'next' cfhttp calls. I usually use something
like this:
<cfset regex=structnew()>
<cfset regex.pagetrim='^.+?<!-- Start of ''content'' container
-->(.+?)<div id="footer">.+$'>
<cfset page=rereplacenocase(cfhttp.filecontent, regex.pagetrim, '\1')>

That particular regex trims out the stuff I don't need from the page
so I can run other regex on it to get the content I do need. This
makes the job faster as there is less to parse through, especially if
your doing replaces.

On Tue, Jul 7, 2009 at 3:40 PM, [email protected]
[email protected]<[email protected]> wrote:
>
> At my job we have a secure website. Every hit to the site is captured by the 
> tracking system to the SQL Server database.
>
> We need to create an inventory system that can look at the data and tell us 
> about the assets on the site.
>
> To get the appropriate data into the database, we need to use a directory 
> crawler that can hit every asset and every item that it finds in the 
> directory structure.
>
> Is there such a crawler, that can appear to be a user-agent, that can crawl a 
> secure website? Is there such a crawler in ColdFusion?
>
> Any ideas or pointers to such a crawler would be very much appreciated.
>
> Thanks in advance,
>
> Jo-Anne Head
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:324331
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to