<cf_rant>

I still come back to my point (see the summary).

CF is not the correct tool to build a spider.  It's oriented for web
applications and effectively parsing and inserting HTML/WAP/A.N.other text
(although CF5 will probably change all that).  For spiders, using it for
anything other than a simple spider is not sensible at all.

Look out the Java spiders (or better still write one!).  It's really not
that difficult to see how you can extend CF using Java to do this.

Of course you can use cfschedule to stop crashing and infinite loops.
Still, when you build an http spider, it should be very quick indeed.  The
process should be:

1: send out an http request for the url
2: receive information (ie error message or text) back from the server
3: parse the text to get all info needed and put into an array
4: start the process again with the next url

Doing this in CF takes up time and CF's resources that it doesn't need to
take up. Leave it to some other tool to handle outside the "web serving"
environment.  Bear in mind that CF's Regular Expressions are not the best,
and as far as parsing text goes, there are much better languages to parse
text in.  CF should not handle too much text parsing.

</cf_rant>

Paul

> -----Original Message-----
> From: Daniel Lancelot [mailto:[EMAIL PROTECTED]]
> Sent: 06 December 2000 11:07
> To: CF-Talk
> Subject: RE: Summary: A CF limitation in building a spider?
>
>
> depends on the settings in cfadmin - you can set pages to time
> out after so
> many min/sec to prevent the server from ccrashing on infinite loops. (bad
> programming)
>
> :> -----Original Message-----
> :> From: Bruce Heerssen [mailto:[EMAIL PROTECTED]]
> :> Sent: 05 December 2000 23:43
> :> To: CF-Talk
> :> Subject: RE: Summary: A CF limitation in building a spider?
> :>
> :>
> :> Does anyone know if a template will timeout if called from
> :> the command line (eg
> :> as a sceduled task). I'm thinking that it will only timeout
> :> when called from the
> :> browser, and then only because the browser stops waiting.
> :> Can someone confirm or
> :> deny this?
> :>
> :> Thanks
> :>
> :> -- Bruce
> :>
> :> > -----Original Message-----
> :> > From: Phill Gibson [mailto:[EMAIL PROTECTED]]
> :> >
> :> > <!-- snip -->
> :> > I will probably also eventually put the CFHTTP one to
> :> > work with <cfschedule>.
> :> >
> :> >
> :> > Phill Gibson
> :> > Velawebs Web Designs
> :> > www.Velawebs.com
> :> > [EMAIL PROTECTED]
> :> >
> :>
> :>
> :>
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Structure your ColdFusion code with Fusebox. Get the official book at 
http://www.fusionauthority.com/bkinfo.cfm

Archives: http://www.mail-archive.com/[email protected]/
Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists

Reply via email to