On 26 Jun 2003 at 19:34, Nicola Ken Barozzi wrote: > >> There are quite a lot of new features in the Cocoon CLI that > >> Forrest isn't using, for example the option to switch off mime-type > >> checking, and to only scan pages once (i.e. not using the > >> link-view) to follow links. > > > > We have to use them, I agree. In fact Forrest was the primary use > > case of having more speed with the new CLI and for not using the > > mimetype thing. > > Oh, and also not having error pages generated where there are errors, > so that a link checker run on live sites can see the real broken > links.
Yup. You can have it generate an error page, or not, and you can choose whether to generate a broken links file as text or as XML. > >> I believe there are still some problems with these new features in > >> the CLI, but it should be possible to fix these. [For example, > >> links being gathered on pipelines referenced via cocoon: protocol - > >> I've found why, but not yet fixed it]. > > In fact the Forrest site cannot be generated with this CLI method, as > it does not play well with link rewriting because of the above bug. I > guess it's because you insert the gatherer *before* the rewriting, > because of the cocoon: protocol usage IIUC as you say. The wierd thing > is that the page gets rendered right, I guess it's just the > cocoon:-called pipeline that complains. What happens is that the link gatherer is added before each serializer, which _includes_ before the (non-called) serializer for the cocoon: pipeline. I need to work out how to find out whether a pipeline is internal or not, in o.a.c.components.treeprocessor.sitemap.SerializeNode.java. If it is internal, the link gatherer should not be inserted. [Or, we could choose to make it explicit, and need to be inserted into the pipeline in the sitemap]. > Then there is again the recursion bug, that makes links get gathered > in a recursive manner, making them longer and longer and longer... I'm not sure about this one. I think I've seen something like this even when spidering with wget, so I wonder what this is to do with. Any ideas? > IIRC we had this before, wasn't it already fixed? I didn't fix it. > >> Is anyone interested in looking into how to upgrade Forrest to use > >> these new features? > >> > >> I think that doing this would stand a chance of resolving all of > >> Luc's problems, and give me some people to do some solid debugging > >> of the CLI. > > To enable this method, users just need to do this: > > - go in the dist/shbat dir > - edit forrest.build.xml > - insert the following line in the Cocoon args: > <arg value="-efalse"/> Or start using the cli.xconf file - it'll give finer grained control as the CLI improves. Regards, Upayavira