Jeff Turner wrote:

I'm not very familiar with the code; is there some cost in keeping the
two-pass CLI alive, in the faint hope that caching comes to its rescue
one day?


Guys,


Before you implement some approach here... Let me suggest something.

Right now sitemap implementation automatically adds link gatherer to the pipeline when it is invoked by CLI. This link gatherer is in fact is "hard-coded links view". I suggest to replace this "hard-coded links view" a.k.a link gatherer with the "real" links view, BUT attach it as a tee to a main pipeline instead of running it as a pipeline by itself. As a result, links view "baby" will be used, two-pass "water" will be drained, and sitemap syntax will stay the same. Moreover, the links view will be still accessible from the outside, meaning that you can spider the site using out-of-the-process spiders.

Example:
Given the pipeline:
 G --> T1 (label="content") --> T2 --> S,

And the links view:
 from-label="content" --> T3 --> LinkSerializer,

The pipeline built for the CLI request should be:
 G --> T1 --> Tee --> T2 --> S --> OutputStream
                \
                  --> LinkSerializer --> NullOutputStream
                          \
                            --> List of links in environment

In one request, you will get:
* Regular output of the pipeline which will go to the destination Source
* List of links in the environment which is what link gatherer was made for

Comments?

Vadim




Reply via email to