If you have CF Enterprise, wouldn't this be a great task for the async gateways?
I don't have Enterprise, so I can only dream... M!ke -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Friday, June 03, 2005 4:36 PM To: [email protected] Subject: [CFCDev] Spider I'm trying to write a CFC that will spider a website and create an inventory of all the pages/files on the website. Its a fairly simple program but awful slow. I create a page list a structure called request.tree. Here is the function <cffunction name="get"> <cfargument name="incomingURL" type="string"> <cfset var local=structNew()> <cfhttp url="#arguments.incomingURL#" method="get" resolveurl="yes"/> <cfscript> local.fileContent=cfhttp.fileContent; request.tree[arguments.incomingURL] = structnew(); request.tree[arguments.incomingURL].linksArray=arraynew(1); request.tree[arguments.incomingURL].hash=hash(local.fileContent); local.startLink = findnocase('http://',local.fileContent,1); while (local.startLink) { local.endlink=min(findnocase('>',local.fileContent,local.startLink),find nocase(' ',local.fileContent,local.startLink)); local.link=trim(mid(local.fileContent,local.startLink,local.endlink-loca l.startLink)); local.link=replace(local.link,chr(34),'',"ALL"); local.link=replace(local.link,'>','',"ALL"); local.link=replace(local.link,chr(32),'',"ALL"); arrayappend(request.tree[arguments.incomingURL].linksArray,local.link); if ( local.link contains request.base and not structkeyexists(request.tree,local.link) ) { get(incomingURL=local.link,level=arguments.level+1); } local.startLink=findnocase('http://',local.fileContent,local.endlink); } </cfscript> <cfreturn /> </cffunction> Unfortunately, its painstakingly slow even for fairly simple sites. Can anybody make any suggestions? Jason Cronk [EMAIL PROTECTED] ---------------------------------------------------------- You are subscribed to cfcdev. To unsubscribe, send an email to [email protected] with the words 'unsubscribe cfcdev' as the subject of the email. CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting (www.cfxhosting.com). CFCDev is supported by New Atlanta, makers of BlueDragon http://www.newatlanta.com/products/bluedragon/index.cfm An archive of the CFCDev list is available at www.mail-archive.com/[email protected]
