On Sun, Jun 29, 2003 at 09:08:14PM +1200, Conal Tuohy wrote:
> Jeff Turner wrote:
> 
> > > That's an issue I've come up against too - it seems that views are
> > > still too "tangled" up with labels and can't cut across pipelines
> > > properly. At least, that's how I understand it - maybe I'm missing
> > > something?
> >
> > I think labels and Views are independent of each other.  You can have
> > a view defined with 'from-position', and not use labels.  Labels are
> > just generic markers, with nothing to say they're only useful for
> > defining views.
> 
> But with from-position you can have only "first" and "last" which is
> even more restrictive than labels. If you want to do anything very
> sophisticated don't you need labels?

Yes, labels and positions.  What else could there be?

> > Views give _every_ public URL in a sitemap an alternative form.  If
> > you only need an alternative form of some URLs, then that can be done
> > just as you've described above, with a request-param selector.
> 
> So ... I could just have use a RequestParamSelector to create my
> different views for the crawler? Damn!

I doubt it.  I was just describing when you'd want to use views at all.
The old CLI chose to use views, which means there's no option for
per-pipeline customization.

> My problem was that I wanted to use Lucene to index a "content" view of
> 2 different pipelines, one of them based on TEI and another on HTML. In
> the case of the TEI pipeline I didn't want to convert the TEI to HTML
> first and then produce a "content" view based on an HTML-ized view of
> the TEI - I wanted an indexable view of the TEI. This is the same issue
> as you mention below:
> 
> > The problem is that Views don't know the type of data they're
> > getting.  If we have a view with from-label="content", we know it's
> > content, but what _type_ of content?  What schema?  What
> > transformation can we apply to create a links-view of this content?
> 
> If you could create more than one view with the same name, then we
> could use labels to specify the schema:
> 
> e.g. 2 pipelines containing:
> ...
> <map:generate src="{1}.xml" label="tei"/>
> ...
> 
> and
> 
> <map:transform src="blah-to-html.xsl" label="html"/>
> 
> ... and 2 views called "content", one with from-label="tei" and the
> other with from-label="html".

Technically that's more or less the solution.  I think a cleaner way
of presenting it is to have one view that interprets different kinds
of data differently:

<map:view name="links" from-position="content">
  <map:select type="xml-type">
    <map:when test="html">
      <map:transform src="html2whatever.xsl"/>
    </map:when>
    <map:when test="tei">
      <map:transform src="tei2whatever.xsl"/>
    </map:when>
  </map:select>
</map:view>

So, treating 'type' as a property of a sitemap component, independent
of labels.  The xml-type selector would somehow discover the type of
XML emitted by its upstream component.


--Jeff

> Cheers
> 
> Con
> 

Reply via email to