I have created a jira issue in response to the messages below, see https://jira.duraspace.org/browse/FCREPO-1019 "Exploration of complex GSearch use cases"
I want to explore complex GSearch use cases and sketch or implement solutions, based on existing and/or potential GSearch functionality. Such functionality includes many-repositories-to-many-indexes, indexing xslt stylesheets creating index documents across Fedora datastreams and/or objects, managing GSearch configurations in Fedora objects (FCREPO-1018), and interaction between the resource index and the Lucene/Solr index(es) (FCREPO-1009). Contributions and feedback are welcome. -Gert On 24/10/2011, at 02.19, <aj...@virginia.edu> <aj...@virginia.edu> wrote: > The intention of bringing the structure of the indexing workflow out of XSLT > into the RDF relationships between objects is not primarily to provide for > complex cases, although it can do that. It is, instead, to make that > structure part of the curation of the objects themselves. > > The interest of this move follows on the claim that the presentation of > objects increasingly is dependent on indexing (in part because so many > "front-end" frameworks for Fedora rely on indexes to immediately construct > many user-facing web pages, and not on direct retrieval from the repository, > e.g. Hydra or Islandora), and that therefore indexing workflow deserves to be > curated alongside data contents in the _strongest practical way_. I claim > further that the strongest possible way to curate relationships between > content datastreams and indexing transforms in a Fedora repository is in > explicit RDF, and that this is practical. > > I quite agree that a powerful but unwieldy or opaque style of configuration > may be worse than a weaker but more transparent style, but I believe that > with enough thought and attention for the specific modeling of workflow, we > could provide graceful factoring in configuration through which simple > GSearch indexing workflows would incur very little expense (and even less > than they now do) but sophisticated workflows remain possible. > > --- > A. Soroka > Online Library Environment > the University of Virginia Library > > > > > On Oct 23, 2011, at 8:05 PM, Conal Tuohy wrote: > >> On 17/10/11 11:48, aj...@virginia.edu wrote: >>> Heartily seconded! >>> >>> In the architecture we're exploring at UVa, we use RELS-INT to define >>> relationships between datastreams and indexing transforms. The relevance to >>> this issue lies in RELS-EXT. By indexing RELS-EXT as a datastream (and >>> assuming that the molecular "para-object" that is responsible for a given >>> index record is constructed via RELS-EXT relationships) we can obtain >>> information about the other objects that may be involved in any index >>> record to which a given object is associated. I'm in agreement that keeping >>> the analysis of object relationships for indexing purposes in indexing XSLT >>> is _not_ the best way, and instead we look to combine this technique with >>> the use of Enhanced Content Model Views to create the kind of multiobject >>> records to which Jonathan is pointing by hiding the explicit structure of >>> the "para-object" from the indexing XSLT. This may or may not be the best >>> possible solution for the problem, so I'm just offering it as a place to >>> begin discussion. >>> >>> >>> --- >>> A. Soroka >>> Online Library Environment >>> the University of Virginia Library >>> >>> >>> >>> >>> On Oct 16, 2011, at 8:15 PM, Jonathan Green wrote: >>> >>>> Something that I think needs to be considered when moving forward with >>>> gsearch is that the index may not always share a 1 to 1 relationship with >>>> objects in fedora. In a very atomistic content model perhaps the solr >>>> document is actually composed of parts from many related objects. These >>>> types of decisions are currently very hard to make in XSLTs. >> In what way hard? Can you expand a little on the difficulties you see? >> >>>> While I think XSLTs have a place in transforming metadata, there needs to >>>> be something more. >> One issue to keep to in mind here is the 80/20 rule. If Fedora's >> indexing system is complex enough to allow for all manner of complex >> cases, then it may be needlessly complex for many simple cases. A more >> complex system would make complex indexing easier, but if it also makes >> simpler cases harder (even just harder to understand a configuration >> system), then the OVERALL ease-of-use might actually decrease. I don't >> think it's possible to strike a perfect balance, but a technology like >> XSLT might be a useful catch-all: it can handle simple cases very >> simply, but can also be extended arbitrarily (including, for instance, >> transcluding metadata from related Fedora objects or other XML datasources). >> >> In very many cases, the mapping of Fedora objects to Solr documents is >> very simple and won't, for instance, involve any aggregation. But the >> mapping from Fedora objects to Solr documents is in principle arbitrary; >> you might choose to do pretty much anything, quite legitimately. You >> might have metadata schemas of any type; you might use the RDF store, >> you might have external authority files, etc. This is where, I think, a >> system which is sufficiently configurable to be fully general could well >> end up as complex as an XSLT-based system would be, but without many of >> the advantages of XSLT (code libraries, books and mail-lists, programmer >> experience, etc). >> >> It might be enough to ship Fedora with a basic set of XSLT transforms, >> and a few sample transforms showing how to use the resource index, etc. >> -- >> >> Conal Tuohy >> eResearch Business Analyst >> Victorian eResearch Strategic Initiative >> +61-466324297 >> >> >> >> ------------------------------------------------------------------------------ >> The demand for IT networking professionals continues to grow, and the >> demand for specialized networking skills is growing even more rapidly. >> Take a complimentary Learning@Cisco Self-Assessment and learn >> about Cisco certifications, training, and career opportunities. >> http://p.sf.net/sfu/cisco-dev2dev >> _______________________________________________ >> Fedora-commons-users mailing list >> Fedora-commons-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users > > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users