Upayavira wrote, On 05/03/2003 16.33:
Nicola Ken wrote:

So, from quick glance, it seems that the way it's done is IMHO the
right way.

Glad you think so!

:-D


Upayavira wrote in bugzilla:

I ran this on a site I built some time ago (with nasty things like Javascript: links, and got files generated for:


#US
#nonUS
http_
javascript_form.submit()
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http_\www.magamall.com=client
javascript_form.submit()

None of which should have been generated. I would therefore ignore links that begin with #, javascript: or mailto:.

I see... what does the "old" CLI do? If it doesn't do it, why? Hmmm...


The controversial one I presume is ignoring links that begin with http://. We could get around this by adding a configuration parameter that specifies the name of the server that the site is based upon. So, when generating the Cocoon site, we could specify that URIs that begin with http://xml.apache.org/cocoon should be spidered, but references to (for example) http://www.w3.org should be ignored. By default, I'd just ignore any links that begin with http://.

I haven't tried this using the old behaviour. I will, and will let you know.

Ah, ok :-)


Also, have you yet measured the speed increases?

I haven't measured it. I'll do that and report back. I'll add some code to report the time taken to generate the site (much like the build script).

Very good. Can't wait to see a real-life comparison report :-)


Is it possible to also have the same 3 step behaviour there was
before?
>
Yes. I've left the original behaviour as the default. All other behaviours can be configured in the xconf file.

Excellent :-)


< happy but frustrated I can't get my hands on it to test it,
  just email...  grrrrr ;-))  >

--
Nicola Ken Barozzi                   [EMAIL PROTECTED]
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------



Reply via email to