I'm trying to write a CFC that will spider a website and create an
inventory of all the pages/files on the website. Its a fairly simple
program but awful slow. I create a page list a structure called
request.tree. Here is the function
<cffunction name="get">
<cfargument name="incomingURL" type="string">
<cfset var local=structNew()>
<cfhttp url="#arguments.incomingURL#" method="get"
resolveurl="yes"/>
<cfscript>
local.fileContent=cfhttp.fileContent;
request.tree[arguments.incomingURL] = structnew();
request.tree[arguments.incomingURL].linksArray=arraynew(1);
request.tree[arguments.incomingURL].hash=hash(local.fileContent);
local.startLink =
findnocase('http://',local.fileContent,1);
while (local.startLink)
{
local.endlink=min(findnocase('>',local.fileContent,local.startLink),findnocase('
',local.fileContent,local.startLink));
local.link=trim(mid(local.fileContent,local.startLink,local.endlink-local.startLink));
local.link=replace(local.link,chr(34),'',"ALL");
local.link=replace(local.link,'>','',"ALL");
local.link=replace(local.link,chr(32),'',"ALL");
arrayappend(request.tree[arguments.incomingURL].linksArray,local.link);
if ( local.link contains request.base and not
structkeyexists(request.tree,local.link) )
{
get(incomingURL=local.link,level=arguments.level+1);
}
local.startLink=findnocase('http://',local.fileContent,local.endlink);
}
</cfscript>
<cfreturn />
</cffunction>
Unfortunately, its painstakingly slow even for fairly simple sites. Can
anybody make any suggestions?
Jason Cronk
[EMAIL PROTECTED]
----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to
[email protected] with the words 'unsubscribe cfcdev' as the subject of the
email.
CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting
(www.cfxhosting.com).
CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm
An archive of the CFCDev list is available at
www.mail-archive.com/[email protected]