On Sun, Jun 29, 2003 at 09:08:14PM +1200, Conal Tuohy wrote: > Jeff Turner wrote: > > > > That's an issue I've come up against too - it seems that views are > > > still too "tangled" up with labels and can't cut across pipelines > > > properly. At least, that's how I understand it - maybe I'm missing > > > something? > > > > I think labels and Views are independent of each other. You can have > > a view defined with 'from-position', and not use labels. Labels are > > just generic markers, with nothing to say they're only useful for > > defining views. > > But with from-position you can have only "first" and "last" which is > even more restrictive than labels. If you want to do anything very > sophisticated don't you need labels?
Yes, labels and positions. What else could there be? > > Views give _every_ public URL in a sitemap an alternative form. If > > you only need an alternative form of some URLs, then that can be done > > just as you've described above, with a request-param selector. > > So ... I could just have use a RequestParamSelector to create my > different views for the crawler? Damn! I doubt it. I was just describing when you'd want to use views at all. The old CLI chose to use views, which means there's no option for per-pipeline customization. > My problem was that I wanted to use Lucene to index a "content" view of > 2 different pipelines, one of them based on TEI and another on HTML. In > the case of the TEI pipeline I didn't want to convert the TEI to HTML > first and then produce a "content" view based on an HTML-ized view of > the TEI - I wanted an indexable view of the TEI. This is the same issue > as you mention below: > > > The problem is that Views don't know the type of data they're > > getting. If we have a view with from-label="content", we know it's > > content, but what _type_ of content? What schema? What > > transformation can we apply to create a links-view of this content? > > If you could create more than one view with the same name, then we > could use labels to specify the schema: > > e.g. 2 pipelines containing: > ... > <map:generate src="{1}.xml" label="tei"/> > ... > > and > > <map:transform src="blah-to-html.xsl" label="html"/> > > ... and 2 views called "content", one with from-label="tei" and the > other with from-label="html". Technically that's more or less the solution. I think a cleaner way of presenting it is to have one view that interprets different kinds of data differently: <map:view name="links" from-position="content"> <map:select type="xml-type"> <map:when test="html"> <map:transform src="html2whatever.xsl"/> </map:when> <map:when test="tei"> <map:transform src="tei2whatever.xsl"/> </map:when> </map:select> </map:view> So, treating 'type' as a property of a sitemap component, independent of labels. The xml-type selector would somehow discover the type of XML emitted by its upstream component. --Jeff > Cheers > > Con >