Jeff wrote:

> > So are you saying you can manage without the XSLT stage?
> 
> I'm not sure, perhaps you can advise.  In Forrest we filter the links
> to:
> 
>  - Remove API doc links
>  - Remove links to directories, which break the CLI
>  - Remove image links that have been hacked to work with FOP
> 
> 1) belongs in cli.xconf.  Perhaps the new CLI handles 2) better than
> the original.  I think 3) is obsolete, as LinkSerializer ignores
> XSL:FO-namespaced links anyway.
> 
> > Perhaps I should explain what I had in mind a bit more with that - I
> > guess I would call it a tee, a pipeline element with one input and
> > two outputs. The input is passed unchanged on through to the next
> > stage in the pipeline. But it is also passed through an XSLT before
> > links are gathered from it.
> 
> I'd call it a hack ;)  Why favour XSLT and not STX, or any other
> transformer?  What about XSLT parameters? etc.  If people need XSLT,
> let them use a link view.  I'd suggest just sticking with the basics:
> <map:transform type="gather-links"/>

Okay. How about defining a namespace <links:link href="xxxx"/> which gets 
consumed by the transformer, that way you choose in your previous XSLT which links 
you want to be spidered by presenting the links in that <links> namespace (and then 
repeat them for the sake of the output).

This would be an extremely simple transformer to write. Beyond writing the 
transformer, it would take a minimal amount (1/2 hour) of changes to the rest of the 
CLI.

> Which isn't a hack.  In fact it would be great for Forrest, because we
> only have a few matchers where links are relevant.  All the cocoon:
> and image pipelines could go without.

Yup.

> Also, it resolves another little dilemma I've had with link views. 
> It's all very well having the notion of a cross-cutting 'view', but
> there's no way to override the 'view' for a specific pipeline.  With
> an explicit gather-links transformer, one could have different link
> analysis for each pipeline.  A *.css pipeline could list @import's as
> links, for example.

Great.

> > > It certainly fixes the hard-wired'ness problem you mention above
> > > (that 'content' != XML before the serializer).
> > 
> > And it sounds as if it could be a trivial solution.
> 
> 'Solves' the cocoon: sub-pipeline problem too.

Yup.

Now the only question that remains is whether to have an implicit gatherer if no 
explicit one is specified. I'd probably say no, as other discussions have erred away 
from hidden things like that.

I think that telling the sitemap where your links are is a pretty reasonable 
adjustment 
to your site. In fact, we could have two transformers - one that just looks for hrefs 
and 
xlinks, and another that uses a links namespace - the former would make it real easy 
to convert your site for spidering, and the latter providing a method to do complex 
link 
management.

Another question - do we still leave link view (two pass) link following in the CLI? 
Or 
does this method deprecate and thus replace it?

Thanks for engaging with me on this - I appreciate it.

Regards, Upayavira



Reply via email to