Thanks for the great replies folks.
Nathan wrote:
>Well, the first suggestion is don't use ColdFusion for this -- not
>really the best tool for the job.
Well when the only tool you have is a hammer...... I don't know perl or
I'd probably do it in that.
>Not sure why you necessarily need to keep a hash of the file contents,
>but that is likely slow for big chunks of HTML.
I need to monitor the site for changes. If a page changes then the hash
will change too and I'll keep a new copy.
>You might find a simple regex to find links is faster than all the
>string manipulation you are doing.
I definately will do this.
>Also, I didn't look too carefully, but it doesn't seem like your code
>really deals with circular references in a web site -- pages that link
>to each other could cause this to just go and go and go, no? Same goes
>for links outside the site -- what causes it to stop crawling (I don't
>see what you do with the LEVEL argument, for instance)?
Yes it does look for this. It keeps everything in a request scope
structure so that if the key (website URL) already exists in the structure
it doesn't respider that page.
Roland wrote:
>Also, there's no reason to pull down image files, etc. and look for links
in them since the content is binary, so >skip them!
Actually this is the main purpose of my app. I need to create an inventory
of all the images on a site and show a listing of everyplace that image is
shown. By hashing the image files I can hopefully identify the same image
that has been renamed and placed on a different part of the site.
Again thanks for the suggestions.
Cheers
Jason Cronk
[EMAIL PROTECTED]
----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to
[email protected] with the words 'unsubscribe cfcdev' as the subject of the
email.
CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting
(www.cfxhosting.com).
CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm
An archive of the CFCDev list is available at
www.mail-archive.com/[email protected]