Ross Gardler wrote: > Thorsten Scherler wrote: > >David Crossley escribi??: > >>David Crossley wrote: > >>>Ross Gardler wrote: > >>> > >>>>Is anyone familiar with configuration of the Cocoon crawler? We need to > >>>>modify it so that it will follow links defined in whatever format the > >>>>output document creates rather than just HTML format documents. > >>> > >>>In our main/webapp/WEB-INF/cli.xconf > >>> > >>> | confirm-extensions: check the mime type for the generated page > >>> | and adjust filename and links extensions > >>> | to match the mime type > >>> | (e.g. text/html->.html) > >>> > >>>at the moment it is set to false. > >>> > >>>I have never understood how to use it. > >>> > >>>Are you suggesting that we might be able to get rid of > >>>the need for responding on filename extensions. > >>> > >>>http://cocoon.apache.org/2.1/userdocs/offline/ > >>>http://wiki.apache.org/cocoon/CommandLine > >>> > >>>I notice from those docs that the default is > >>>confirm-extensions=true (opposite to us). > >> > >>I tried this today ... > >> > >>Edit main/webapp/WEB-INF/cli.xconf and > >>set "confirm-extensions=true". > >> > >>Do 'forrest site' ... > >> > >>* [1/0] [0/0] 5.633s 10.5Kb linkmap.html > >>Total time: 0 minutes 7 seconds, Site size: 10,782 Site pages: 1 > >> > >>So it processed the first page but did not gather any links > >>from the page (the third column numbers are empty).
Perhaps internally Cocoon is now appending a filename extension, which confuses the linkgatherer. I don't even know if "confirm-extensions" does that. One should look at the Cocoon code. > >>Unfortunately we cannot see any logs in 'forrest site' mode > >>due to issue: Cannot find the Jira issue. It does cause big problems for being able to debug. > >Just a shot in the dark, we have/had a similar problem in v2. The > >crawler expect certain markup such as <a href=""/> AFAIR. > > According to the CLI docs (if I remember correctly) the crawler should > follow links in @href, @src, etc. regardless of the parent element. I think that is tangential. My experiment is with our site-author docs. It has stacks of links that are normally processed. [ snip ] > What is forrest run doing? All okay in 'forrest run' mode because this is the Cocoon configuration cli.xconf, i.e. command-line i.e. 'forrest site' -David