Oops, sorry. That's what I get for not making you guys wade through a very
large page of crap code.
<CFSCRIPT>
// de-dupe
function DeDupe(list,type) {
return REReplaceNoCase(ListSort(list,type),"([^,]+)(,\1)*","\1","ALL");
}
function msToSec(tick) {
return numberFormat(tick / 1000, "9999.9");
}
</CFSCRIPT>
----- Original Message -----
From: "Robert Everland" <[EMAIL PROTECTED]>
To: "CF-Talk" <[EMAIL PROTECTED]>
Sent: Monday, June 17, 2002 3:49 PM
Subject: RE: CFMX Spidering for cache
> I am playing with the code, looks like you are using a function called
> dedupe, do you have this?
>
> Robert Everland III
> Web Developer Extraordinaire
> Dixon Ticonderoga Company
> http://www.dixonusa.com
>
> -----Original Message-----
> From: Pete Ruckelshaus [mailto:[EMAIL PROTECTED]]
> Sent: Monday, June 17, 2002 3:13 PM
> To: CF-Talk
> Subject: Re: CFMX Spidering for cache
>
>
> Here's a bit of code I wrote (well, it's half-complete, but does what I
need
> it to do, which is spider the site and preload the CF Cache). Pardon the
> ugliness, you'll probably have to define a couple of variables and create
a
> form interface for this, but it's the result of more than a couple of
hours
> of work and should be enough to get you started. You could start with this
> and set up an app variable that if it isn't present, could run this
> script...so it gets run when the service is restarted:
>
> Pete
>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>
> <cfset preloadStart = getTickCount()>
> Finding URL's from <cfoutput>#URL.startURL#</cfoutput> and
preloading....<br
> /><br /> <cfhttp url="#URL.startURL#" method="get"
> resolveurl="true"></cfhttp> <cfparam name="seed_list" default="">
<cfparam
> name="ptr" default="1"> <cfloop condition="ptr LT len(cfhttp.fileContent)
> AND ptr GT 0">
> <cfset
>
hit=REFind("(http://[a-zA-Z0-9\.\/\:\-\+\?\&\_\%\=\-]+)",cfhttp.fileContent,
> ptr, "true")>
> <cfif hit.pos[1] GT 0>
> <cfset link_found = Mid(cfhttp.fileContent,hit.pos[1],hit.len[1])>
> <cfif (findNoCase("#URL.startURL#", link_found) AND
(findNoCase(".htm",
> link_found))) AND NOT findNoCase("/tools/", link_found)>
> <cfset seed_list = ListAppend(seed_list, link_found)>
> </cfif>
> <cfset ptr = hit.pos[1] + 1>
> <cfelse>
> <cfset ptr = 0>
> </cfif>
> </cfloop>
> <cfset seed_list = #listSort(seed_list, "TextNoCase")#>
> <cfif len(trim(seed_list)) GT 0>
> <b>Seed list generated...</b>
> <cfoutput><ol>
> <cfloop index="i" list="#seed_list#" delimiters=","><li>#i#</li></cfloop>
> </ol></cfoutput>
> <cfelse>
> NO URL's found, aborting...
> <cfabort>
> </cfif>
> <hr>
> <ol>
> <li>At this point, we have a list of starting URL's. [done]</li> <li>Set
a
> variable called seed_list [done]</li> <li>if it's the first loop
iteration,
> use seed_list [done]</li> <li>If it's in subsequent iteration, use
> temp_list</li> <li>at the end of each loop, save 2 variables --
full_list,
> which is ALL of the URL's that just got spidered -- temp_list which is
> url_list with the contents of full_list removed so each page only gets
> spidered once.</li> <li>after the loops have run, take the contents of
> url_list and save it to a text file.</li> </ol> <hr> <cfset loopCount =
"1">
> <cfloop index="i" from="1" to="#numLoops#">
> <cfif loopCount IS "1">
> <!--- set temp_urls to seed_list and use that to surf --->
> <cfset processed_urls = "http://localhost/default.cfm">
> <!--- set good_urls to seed_list and use that to store all good values
> --->
> <cfset in_process_urls = #seed_list#>
> </cfif>
> <cfoutput>
> <h3>Loop #i#, processing #listLen(in_process_urls)# URL's
> (#loopCount#)</h3> <ol>
> <cfset to_do_urls = "">
> <cfloop list=#in_process_urls# index="url">
> <li>#url# spidered...</li>
> <cfset finalCount = #deDupe(in_process_urls,"text")#>
> <cfhttp url="#url#" method="get" resolveurl="true"></cfhttp>
> <cfset ptr = 1>
> <cfloop condition="ptr LT len(cfhttp.fileContent) AND ptr GT 0">
> <cfset
>
hit=REFind("(http://[a-zA-Z0-9\.\/\:\-\+\?\&\_\%\=\-]+)",cfhttp.fileContent,
> ptr, "true")>
> <cfif hit.pos[1] GT 0>
> <cfset link_found = Mid(cfhttp.fileContent,hit.pos[1],hit.len[1])>
> <cfif (findNoCase("#URL.startURL#", link_found) AND
> (findNoCase(".htm", link_found))) AND NOT findNoCase("/tools/",
link_found)>
>
> <cfif listContainsNoCase( processed_urls, link_found ) EQ 0>
> <cfset to_do_urls = ListAppend(to_do_urls, link_found)>
> </cfif>
>
> </cfif>
> <cfset ptr = hit.pos[1] + 1>
> <cfelse>
> <cfset ptr = 0>
> </cfif>
> </cfloop>
> </cfloop>
> </ol>
> </cfoutput>
>
> <cfset processed_urls = ListAppend(processed_urls,
> #deDupe(in_process_urls,"text")#)>
> <cfset in_process_urls = #deDupe(to_do_urls,"text")#>
> <cfset loopCount = #loopCount# + 1>
> <table>
> <tr valign="top">
> <td>Processed URL's:<ol><cfoutput><cfloop index="i"
> list="#processed_urls#"
> delimiters=","><li>#i#</li></cfloop></cfoutput></ol></td>
> <td>To Do URL's:<ol><cfoutput><cfloop index="i"
list="#in_process_urls#"
> delimiters=","><li>#i#</li></cfloop></cfoutput></ol></td>
> </tr>
> </table>
> </cfloop>
>
> <h3>Spidering complete.</h3>
> <cfset preloadFinish = getTickCount()>
> <cfset preloadTime = preloadFinish - preloadStart>
> <cfset preloadTimeSec = preloadtime / 1000>
> <cfset preLoadTimeMin = preLoadTimeSec / 60>
> <cfset secMod = (preLoadTimeSec mod 60)>
>
> <cfoutput>#listLen(processed_urls)# URL's processed in <cfif
preloadTimeSec
> LT 60>#msTosec(preLoadTime)# Seconds<cfelse>#numberFormat(preloadTimeMin,
> "9999")# Minutes and #numberFormat(secMod, "9999.9")# Seconds.</cfif> <a
> href="?">Return</a>.</cfoutput>
>
>
>
>
>
>
> ----- Original Message -----
> From: "Robert Everland" <[EMAIL PROTECTED]>
> To: "CF-Talk" <[EMAIL PROTECTED]>
> Sent: Monday, June 17, 2002 2:40 PM
> Subject: RE: CFMX Spidering for cache
>
>
> > Ehhhh I was hoping for there to be a cf solution, becauise say the
> > server reboots I now have to rely on something external to make sure
> > no one gets
> a
> > slow application.
> >
> > Robert Everland III
> > Web Developer Extraordinaire
> > Dixon Ticonderoga Company
> > http://www.dixonusa.com
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, June 17, 2002 2:40 PM
> > To: CF-Talk
> > Subject: RE: CFMX Spidering for cache
> >
> >
> > Robert, while not written or designed for the task I use a product
> > called Black Widow, it is a site grabber, but works very well at doing
> > exactly
> what
> > you want to do, and if I remember right the price was right around $30
> when
> > I bought my copy of it...
> >
> > HTH,
> > John
> >
> > -----Original Message-----
> > From: Robert Everland [mailto:[EMAIL PROTECTED]]
> > Sent: Monday, June 17, 2002 2:29 PM
> > To: CF-Talk
> > Subject: CFMX Spidering for cache
> >
> >
> > Does anyone know if there is a version of CFMX that offers a spider or
> > a way to compile the webpages so that there isn't a huge latency when
> > someone goes to the site for the first time?
> >
> > Robert Everland III
> > Web Developer Extraordinaire
> > Dixon Ticonderoga Company
> > http://www.dixonusa.com
> >
> >
> >
>
>
______________________________________________________________________
This list and all House of Fusion resources hosted by CFHosting.com. The place for
dependable ColdFusion Hosting.
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq
Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists