On 1 Jul 2003 at 14:47, Vadim Gritsenko wrote: > Jeff Turner wrote: > > >I'm not very familiar with the code; is there some cost in keeping > >the two-pass CLI alive, in the faint hope that caching comes to its > >rescue one day? > > > > Guys, > > Before you implement some approach here... Let me suggest something. > > Right now sitemap implementation automatically adds link gatherer to > the pipeline when it is invoked by CLI. This link gatherer is in fact > is "hard-coded links view". I suggest to replace this "hard-coded > links view" a.k.a link gatherer with the "real" links view, BUT attach > it as a tee to a main pipeline instead of running it as a pipeline by > itself. As a result, links view "baby" will be used, two-pass "water" > will be drained, and sitemap syntax will stay the same. Moreover, the > links view will be still accessible from the outside, meaning that you > can spider the site using out-of-the-process spiders. > > Example: > Given the pipeline: > G --> T1 (label="content") --> T2 --> S, > > And the links view: > from-label="content" --> T3 --> LinkSerializer, > > The pipeline built for the CLI request should be: > G --> T1 --> Tee --> T2 --> S --> OutputStream > \ > --> LinkSerializer --> NullOutputStream > \ > --> List of links in environment > > In one request, you will get: > * Regular output of the pipeline which will go to the destination > Source * List of links in the environment which is what link gatherer > was made for
Splendid. I think that is exactly what I would want to do. We'd then have single(ish) pass generation with the benefits of link view. And if you just feed directly from the label into a serializer, it'll be pretty much the same in terms of performance as the LinkGatherer that we have now. I would need help implementing this. Are you able to explain how? There's a lot of pipeline building there that I wouldn't yet know how to do (but I'm willing to give it a go with guidance). If we're to use my current approach, we'd add a different serializer at the end of the second sub-pipe, which would take the links and put them into a specific List in the ObjectModel. In fact, we could create a LinkGatheringOutputStream that'd be handed to the LinkSerializer to do that. That would leave most of the complexity simply in building the pipeline. Can you guarantee that cocoon.process() will not complete until both sub-pipelines have completed their work? I'll take a bit of a look into the pipeline building code (if I can find it) to see what I can work out. This approach excites me. With help, I'd like to see if I can make it happen. Regards, Upayavira