Nicola Ken wrote: > So, from quick glance, it seems that the way it's done is IMHO the > right way.
Glad you think so! > Upayavira wrote in bugzilla: > " > This code appears to try to check pages that begin with #, javascript: > or http://. I plan to prevent this, and probably sort other things > too, but I'd like to see what people think of this code before I do > anything else. " > Could you please explain it a bit more, and the changes you'd like to > make. Especially, is this behaviour different from the previous one? I ran this on a site I built some time ago (with nasty things like Javascript: links, and got files generated for: #US #nonUS http_ javascript_form.submit() [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] http_\www.magamall.com=client javascript_form.submit() None of which should have been generated. I would therefore ignore links that begin with #, javascript: or mailto:. The controversial one I presume is ignoring links that begin with http://. We could get around this by adding a configuration parameter that specifies the name of the server that the site is based upon. So, when generating the Cocoon site, we could specify that URIs that begin with http://xml.apache.org/cocoon should be spidered, but references to (for example) http://www.w3.org should be ignored. By default, I'd just ignore any links that begin with http://. I haven't tried this using the old behaviour. I will, and will let you know. > Also, have you yet measured the speed increases? I haven't measured it. I'll do that and report back. I'll add some code to report the time taken to generate the site (much like the build script). > Is it possible to also have the same 3 step behaviour there was > before? Yes. I've left the original behaviour as the default. All other behaviours can be configured in the xconf file. Regards, Upayavira