On 26 Jun 2003 at 19:34, Nicola Ken Barozzi wrote:

> >> There are quite a lot of new features in the Cocoon CLI that
> >> Forrest isn't using, for example the option to switch off mime-type
> >> checking, and to only scan pages once (i.e. not using the
> >> link-view) to follow links.
> >  
> > We have to use them, I agree. In fact Forrest was the primary use
> > case of having more speed with the new CLI and for not using the
> > mimetype thing.
> 
> Oh, and also not having error pages generated where there are errors,
> so that a link checker run on live sites can see the real broken
> links.

Yup. You can have it generate an error page, or not, and you can choose whether to 
generate a broken links file as text or as XML.

> >> I believe there are still some problems with these new features in
> >> the CLI, but it should be possible to fix these. [For example,
> >> links being gathered on pipelines referenced via cocoon: protocol -
> >> I've found why, but not yet fixed it].
> 
> In fact the Forrest site cannot be generated with this CLI method, as
> it does not play well with link rewriting because of the above bug. I
> guess it's because you insert the gatherer *before* the rewriting,
> because of the cocoon: protocol usage IIUC as you say. The wierd thing
> is that the page gets rendered right, I guess it's just the
> cocoon:-called pipeline that complains.

What happens is that the link gatherer is added before each serializer, which 
_includes_ before the (non-called) serializer for the cocoon: pipeline. I need to work 
out how to find out whether a pipeline is internal or not, in 
o.a.c.components.treeprocessor.sitemap.SerializeNode.java. If it is internal, the link 
gatherer should not be inserted.

[Or, we could choose to make it explicit, and need to be inserted into the pipeline in 
the sitemap].
 
> Then there is again the recursion bug, that makes links get gathered
> in a recursive manner, making them longer and longer and longer...

I'm not sure about this one. I think I've seen something like this even when spidering 
with wget, so I wonder what this is to do with. Any ideas?

> IIRC we had this before, wasn't it already fixed?

I didn't fix it.
 
> >> Is anyone interested in looking into how to upgrade Forrest to use
> >> these new features?
> >>
> >> I think that doing this would stand a chance of resolving all of
> >> Luc's problems, and give me some people to do some solid debugging
> >> of the CLI.
> 
> To enable this method, users just need to do this:
> 
>   - go in the dist/shbat dir
>   - edit forrest.build.xml
>   - insert the following line in the Cocoon args:
>      <arg value="-efalse"/>

Or start using the cli.xconf file - it'll give finer grained control as the CLI 
improves.

Regards, Upayavira

Reply via email to