Re: [fcrepo-user] [fcrepo-dev] GSearch planning

ajs6f Thu, 27 Oct 2011 16:13:05 -0700

> So, to be clear, what you're suggesting is that XSLT is indeed used to 
> generate Solr documents, but that the object being indexed would decide which 
> XSLT was used, by having an explicit link to another object which contains 
> the XSLT as a datastream?

Something like that. We allow for any particular URI to be the predicate of the 
RDF triple that tells where to find the transform, but in practice, we want 
that to be either a datastream or a dissemination (for the case of 
dynamically-generated transforms) of a Fedora object.

It is not difficult to extract technical metadata from an audio datastream , 
but XML is not the source and XSLT is not the tool, so the argument I am making 
is not really about XSLT. I am _not_ suggesting that GSearch should provide 
such functionality, except by using some intermediating independently-developed 
system (e.g. Apache Tika). I _am_ suggesting that we should like to devise in 
such wise as to provide for that pattern without horrible hackery.

To speak to your last assumption, it is clear that GSearch will in any 
circumstance require some language to express the structure of the workflow 
that uses the indexing configuration. Java is one obvious alternative, and it 
is what GSearch uses now. I think it suffers from a certain flaw _if_ we take 
for granted my claims about the value of storing configuration in the 
repository; Java cannot easily be stored in the repository in a manner 
efficient for execution. What then?

Here's one example, which I think is probably too complex for a general use: in 
our development at UVa, we have divided the definition of workflow between 
Apache Camel routers (for the structure of workflow: those parts of the action 
that involve retrieving datastreams from indexed objects, retrieving 
transforms, running the latter against the former, merging fields, etc.) and 
XSLT (for workflow configuration: those parts of the action that work on the 
datastreams). The use, to construct an index record, of objects related to a 
primary object occurs when RELS-EXT in the primary object is indexed-- and 
indeed, it is for now XSLT that must encode for us the manner in which such 
related objects are used. That makes me sad, because XSLT is not a good 
language in which to define workflows, though we must sometimes use it so. 

We've done a little experimentation with the use of ECM Views as a way of 
moving that last piece of the puzzle out of code into RDF in a repository, but 
have no immediate success to report. In any event, it's not clear what the 
current timetable for ECM Views to move into core repository services is, and 
it's not yet completely clear that they can be used to help provide for a 
general configuration mechanism in this way.

My view is that for GSearch, moving the configuration of indexing workflow into 
the repository may or may not mean moving the definition of the workflow 
structure out of Java. If it is to be moved out of Java and into the 
repository, there are a number of possible choices. XSLT is one. It is a bad 
choice because it is not meant for such use and problems arise thencefrom, but 
it is a good choice for all the reasons Mr. Tuohy gave earlier. Other choices, 
like Apache Camel or choreography languages, have their own strengths and 
difficulties. It may be best for now to leave the fundamental structure of the 
workflow in the GSearch application code but to take pains to see to it that 
sophisticated workflows can _easily_extend that structure in specific cases. 
XSLT is certainly one tool by which that could be done.

---
A. Soroka
Online Library Environment
the University of Virginia Library

On Oct 24, 2011, at 7:33 PM, Conal Tuohy wrote:

> On 24/10/11 11:19, aj...@virginia.edu wrote:
>> The intention of bringing the structure of the indexing workflow out of XSLT 
>> into the RDF relationships between objects is not primarily to provide for 
>> complex cases, although it can do that. It is, instead, to make that 
>> structure part of the curation of the objects themselves.
>> 
>> The interest of this move follows on the claim that the presentation of 
>> objects increasingly is dependent on indexing (in part because so many 
>> "front-end" frameworks for Fedora rely on indexes to immediately construct 
>> many user-facing web pages, and not on direct retrieval from the repository, 
>> e.g. Hydra or Islandora), and that therefore indexing workflow deserves to 
>> be curated alongside data contents in the _strongest practical way_. I claim 
>> further that the strongest possible way to curate relationships between 
>> content datastreams and indexing transforms in a Fedora repository is in 
>> explicit RDF, and that this is practical.
> So, to be clear, what you're suggesting is that XSLT is indeed used to 
> generate Solr documents, but that the object being indexed would decide 
> which XSLT was used, by having an explicit link to another object which 
> contains the XSLT as a datastream?
> 
> That seems perfectly reasonable to me.
> 
> I assume, then, that having selected the appropriate XSLT, the XSLT 
> itself would be responsible for (recursively) downloading and traversing 
> any RDF relationships it was interested in?
> 
> -- 
> Conal Tuohy
> eResearch Business Analyst
> Victorian eResearch Strategic Initiative
> +61-466324297
> 
> 
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn 
> about Cisco certifications, training, and career opportunities. 
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] [fcrepo-dev] GSearch planning

Reply via email to