Re: [fcrepo-user] [fcrepo-dev] GSearch planning

Gert Schmeltz Pedersen Mon, 24 Oct 2011 02:39:22 -0700

I have created a jira issue in response to the messages below, see

https://jira.duraspace.org/browse/FCREPO-1019 "Exploration of complex GSearch 
use cases"


I want to explore complex GSearch use cases and sketch or implement solutions, 
based on existing and/or potential GSearch functionality. Such functionality 
includes many-repositories-to-many-indexes, indexing xslt stylesheets creating 
index documents across Fedora datastreams and/or objects, managing GSearch 
configurations in Fedora objects (FCREPO-1018), and interaction between the 
resource index and the Lucene/Solr index(es) (FCREPO-1009). 

Contributions and feedback are welcome.

-Gert


On 24/10/2011, at 02.19, <aj...@virginia.edu> <aj...@virginia.edu> wrote:

> The intention of bringing the structure of the indexing workflow out of XSLT 
> into the RDF relationships between objects is not primarily to provide for 
> complex cases, although it can do that. It is, instead, to make that 
> structure part of the curation of the objects themselves.
> 
> The interest of this move follows on the claim that the presentation of 
> objects increasingly is dependent on indexing (in part because so many 
> "front-end" frameworks for Fedora rely on indexes to immediately construct 
> many user-facing web pages, and not on direct retrieval from the repository, 
> e.g. Hydra or Islandora), and that therefore indexing workflow deserves to be 
> curated alongside data contents in the _strongest practical way_. I claim 
> further that the strongest possible way to curate relationships between 
> content datastreams and indexing transforms in a Fedora repository is in 
> explicit RDF, and that this is practical.
> 
> I quite agree that a powerful but unwieldy or opaque style of configuration 
> may be worse than a weaker but more transparent style, but I believe that 
> with enough thought and attention for the specific modeling of workflow, we 
> could provide graceful factoring in configuration through which simple 
> GSearch indexing workflows would incur very little expense (and even less 
> than they now do) but sophisticated workflows remain possible.
> 
> ---
> A. Soroka
> Online Library Environment
> the University of Virginia Library
> 
> 
> 
> 
> On Oct 23, 2011, at 8:05 PM, Conal Tuohy wrote:
> 
>> On 17/10/11 11:48, aj...@virginia.edu wrote:
>>> Heartily seconded!
>>> 
>>> In the architecture we're exploring at UVa, we use RELS-INT to define 
>>> relationships between datastreams and indexing transforms. The relevance to 
>>> this issue lies in RELS-EXT. By indexing RELS-EXT as a datastream (and 
>>> assuming that the molecular "para-object" that is responsible for a given 
>>> index record is constructed via RELS-EXT relationships) we can obtain 
>>> information about the other objects that may be involved in any index 
>>> record to which a given object is associated. I'm in agreement that keeping 
>>> the analysis of object relationships for indexing purposes in indexing XSLT 
>>> is _not_ the best way, and instead we look to combine this technique with 
>>> the use of Enhanced Content Model Views to create the kind of multiobject 
>>> records to which Jonathan is pointing by hiding the explicit structure of 
>>> the "para-object" from the indexing XSLT. This may or may not be the best 
>>> possible solution for the problem, so I'm just offering it as a place to 
>>> begin discussion.
>>> 
>>> 
>>> ---
>>> A. Soroka
>>> Online Library Environment
>>> the University of Virginia Library
>>> 
>>> 
>>> 
>>> 
>>> On Oct 16, 2011, at 8:15 PM, Jonathan Green wrote:
>>> 
>>>> Something that I think needs to be considered when moving forward with 
>>>> gsearch is that the index may not always share a 1 to 1 relationship with 
>>>> objects in fedora. In a very atomistic content model perhaps the solr 
>>>> document is actually composed of parts from many related objects. These 
>>>> types of decisions are currently very hard to make in XSLTs.
>> In what way hard? Can you expand a little on the difficulties you see?
>> 
>>>> While I think XSLTs have a place in transforming metadata, there needs to 
>>>> be something more.
>> One issue to keep to in mind here is the 80/20 rule. If Fedora's 
>> indexing system is complex enough to allow for all manner of complex 
>> cases, then it may be needlessly complex for many simple cases. A more 
>> complex system would make complex indexing easier, but if it also makes 
>> simpler cases harder (even just harder to understand a configuration 
>> system), then the OVERALL ease-of-use might actually decrease. I don't 
>> think it's possible to strike a perfect balance, but a technology like 
>> XSLT might be a useful catch-all: it can handle simple cases very 
>> simply, but can also be extended arbitrarily (including, for instance, 
>> transcluding metadata from related Fedora objects or other XML datasources).
>> 
>> In very many cases, the mapping of Fedora objects to Solr documents is 
>> very simple and won't, for instance, involve any aggregation. But the 
>> mapping from Fedora objects to Solr documents is in principle arbitrary; 
>> you might choose to do pretty much anything, quite legitimately. You 
>> might have metadata schemas of any type; you might use the RDF store, 
>> you might have external authority files, etc. This is where, I think, a 
>> system which is sufficiently configurable to be fully general could well 
>> end up as complex as an XSLT-based system would be, but without many of 
>> the advantages of XSLT (code libraries, books and mail-lists, programmer 
>> experience, etc).
>> 
>> It might be enough to ship Fedora with a basic set of XSLT transforms, 
>> and a few sample transforms showing how to use the resource index, etc.
>> -- 
>> 
>> Conal Tuohy
>> eResearch Business Analyst
>> Victorian eResearch Strategic Initiative
>> +61-466324297
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> The demand for IT networking professionals continues to grow, and the
>> demand for specialized networking skills is growing even more rapidly.
>> Take a complimentary Learning@Cisco Self-Assessment and learn 
>> about Cisco certifications, training, and career opportunities. 
>> http://p.sf.net/sfu/cisco-dev2dev
>> _______________________________________________
>> Fedora-commons-users mailing list
>> Fedora-commons-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> 
> 
> ------------------------------------------------------------------------------
> The demand for IT networking professionals continues to grow, and the
> demand for specialized networking skills is growing even more rapidly.
> Take a complimentary Learning@Cisco Self-Assessment and learn 
> about Cisco certifications, training, and career opportunities. 
> http://p.sf.net/sfu/cisco-dev2dev
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] [fcrepo-dev] GSearch planning

Reply via email to