I think you could optimize it better since you have control over what java code is being generated. Unless you think the macromedia interpreter can guess and do a better job?

I dont necessarilly think the java copy I posted about is faster; but I think it could be improved enough where it would be faster.

Anyway he was asking for suggestions, I gave one.  Like anything posted on this list - use at your own risk.

Bill

On 6/5/05, Roland Collins <[EMAIL PROTECTED]> wrote:

But CF compiles to native java bytecode and runs on the same JVM that the java version would anyway, so I don't know why you think it would be faster just because it's a "native" java implementation.

 

Roland

 


From: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED]] On Behalf Of Bill Rawlinson
Sent: Sunday, June 05, 2005 10:41 AM


To: [email protected]
Subject: Re: [CFCDev] Spider

 

i think depending on your circumstances, any solution will be slow since it has to get each page and parse it, but im sure without a lot of work the java solution could be made to run much faster.



On 6/4/05, Roland Collins <[EMAIL PROTECTED]> wrote:

It is sloooooow ;)

 


From: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] ] On Behalf Of Bill Rawlinson
Sent: Saturday, June 04, 2005 7:32 AM
To: [email protected]
Subject: Re: [CFCDev] Spider

 

there is a free java class library out there that does this. 
the class can spider or there is another that can make an copy of any site that it can access; just give it the base url and bam..

http://www.acme.com/java/software/Acme.Spider.html
(here is an implementation of it as an applet: http://www.acme.com/java/software/WebList.html )
http://www.acme.com/java/software/WebCopy.html

Dont reinvent the wheel :O)



Bill

On 6/3/05, Roland Collins < [EMAIL PROTECTED]> wrote:

Use Regular Expressions!!!  Also, there's no reason to pull down image
files, etc. and look for links in them since the content is binary, so skip
them!  After rewriting your function using RE and ignoring images, it seems
to run on average 3x faster at 3 levels deep.  This should be almost an
exponential savings relative to the depth of the spider due to the pruning
of the files pulled.

Attached is a CFC that contains a modified version of your function.  To use
it, initialize it and say go!

<cfset spider = createObject("component",
"Spider").init("http://www.yoursite.com:80", 3)>
<cfset results = spider.get()>
<cfdump var="#results#">

This requires CF7.  If you don't have CF7, replace
"local.httpResult.fileContent" with "cfhttp.fileContent" and remove
result=" local.httpResult" from the cfhttp tag.

Roland

-----Original Message-----
From: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED]] On Behalf
Of [EMAIL PROTECTED]
Sent: Friday, June 03, 2005 5:36 PM
To: [email protected]
Subject: [CFCDev] Spider






I'm trying to write a CFC that will spider a website and create an
inventory of all the pages/files on the website.  Its a fairly simple
program but awful slow.  I create a page list a structure called
request.tree.  Here is the function


      <cffunction name="get">
            <cfargument name="incomingURL" type="string">
            <cfset var local=structNew()>

            <cfhttp url=""> resolveurl="yes"/>
            <cfscript>
                  local.fileContent=cfhttp.fileContent;
                  request.tree [arguments.incomingURL] = structnew();

request.tree[arguments.incomingURL].linksArray=arraynew(1);

request.tree[arguments.incomingURL].hash=hash(local.fileContent);
                  local.startLink =
findnocase('http://',local.fileContent ,1);
                  while (local.startLink)
                        {

local.endlink=min(findnocase('>',local.fileContent,local.startLink),findnoca
se('
',local.fileContent,local.startLink));

local.link=trim(mid(local.fileContent ,local.startLink,local.endlink-local.st
artLink));
                        local.link=replace(local.link,chr(34),'',"ALL");
                        local.link=replace (local.link,'>','',"ALL");
                        local.link=replace(local.link,chr(32),'',"ALL");


arrayappend(request.tree[arguments.incomingURL].linksArray,local.link);
                        if ( local.link contains request.base and not
structkeyexists(request.tree,local.link) )
                              {

get(incomingURL=local.link,level=arguments.level+1);
                              }

local.startLink=findnocase('http://',local.fileContent,local.endlink);
                        }
            </cfscript>

            <cfreturn />
      </cffunction>


Unfortunately, its painstakingly slow even for fairly simple sites.  Can
anybody make any suggestions?

Jason Cronk
[EMAIL PROTECTED]




----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to
[email protected] with the words 'unsubscribe cfcdev' as the subject of the
email.

CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting
(www.cfxhosting.com).

CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm

An archive of the CFCDev list is available at
www.mail-archive.com/[email protected]






----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to [email protected] with the words 'unsubscribe cfcdev' as the subject of the email.

CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting (www.cfxhosting.com).

CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm

An archive of the CFCDev list is available at www.mail-archive.com/[email protected]

If you want Gmail - just ask. ----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to [email protected] with the words 'unsubscribe cfcdev' as the subject of the email.

CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting ( www.cfxhosting.com).

CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm

An archive of the CFCDev list is available at www.mail-archive.com/[email protected]

----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to [email protected] with the words 'unsubscribe cfcdev' as the subject of the email.

CFCDev is run by CFCZone ( www.cfczone.org) and supported by CFXHosting ( www.cfxhosting.com).

CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm

An archive of the CFCDev list is available at www.mail-archive.com/[email protected]

If you want Gmail - just ask. ----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to [email protected] with the words 'unsubscribe cfcdev' as the subject of the email.

CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting (www.cfxhosting.com).

CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm

An archive of the CFCDev list is available at www.mail-archive.com/[email protected]

----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to [email protected] with the words 'unsubscribe cfcdev' as the subject of the email.

CFCDev is run by CFCZone ( www.cfczone.org) and supported by CFXHosting (www.cfxhosting.com).

CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm

An archive of the CFCDev list is available at www.mail-archive.com/[email protected]



--
[EMAIL PROTECTED]
http://blog.rawlinson.us

If you want Gmail - just ask. ----------------------------------------------------------
You are subscribed to cfcdev. To unsubscribe, send an email to [email protected] with the words 'unsubscribe cfcdev' as the subject of the email.

CFCDev is run by CFCZone (www.cfczone.org) and supported by CFXHosting (www.cfxhosting.com).

CFCDev is supported by New Atlanta, makers of BlueDragon
http://www.newatlanta.com/products/bluedragon/index.cfm

An archive of the CFCDev list is available at www.mail-archive.com/[email protected]

Reply via email to