Re: [fcrepo-user] [fcrepo-dev] GSearch planning

ajs6f Thu, 13 Oct 2011 13:39:28 -0700

As to roadmaps, I myself hope that discussion on this list and crystallized in 
the Duraspace Jira issue-tracking system will make GSearch's direction 
completely transparent.


I can't but point out that a very popular and well-supported XML language for 
describing mappings from XML metadata to the Solr (XML) document format already 
exists: XSLT.

---
A. Soroka
Online Library Environment
the University of Virginia Library




On Oct 12, 2011, at 8:44 PM, Tom Cramer wrote:

> Gert, Adam,
> 
> In a similar but different vein, at RIRI this year, several of us had 
> conversations about opportunities to share code and componentry between the 
> Hydra and Islandora projects. One possibility that arose was generalizing 
> solrizer[1] to act as a general purpose Fedora-> solr indexing tool. 
> Currently solrizer is an integral component in the Hydra stack, and is Ruby 
> on Rails code that uses the models defined in a given Hydra head to 
> automatically index data stream contents into a solr index which is then used 
> for both search and read operations (via Blacklight).
> 
> To echo elements of Adam's proposal, the concept that emerged was to take the 
> models that define the mappings of data streams to fields of the solr index 
> out of solrizer's Ruby code, and instead express them as XML files. These XML 
> files could then be stored in a Fedora repository. This would remove the 
> platform dependency on Ruby development; it would also keep the mappings on 
> how to interpret / index Fedora objects in the repo instead of application 
> code. 
> 
> Matt Zumwalt (who may be kicking me under the table from Minneapolis right 
> now) will be examining this as a possible architectural direction for 
> solrizer moving forward. 
> 
> I don't know what opportunities, if any, there are for cross-pollenation or 
> convergence between solrizer and GSearch, but it seems that roadmap sharing 
> at the very least would be healthy.
> 
> - Tom
> 
> 
> [1] https://github.com/projecthydra/solrizer
> 
> 
> 
> 
> On Oct 12, 2011, at 7:41 AM, aj...@virginia.edu wrote:
> 
>> Here's a less straightforward idea, which I haven't put into a Jira issue 
>> because it warrants discussion, if it evenis to become part of the roadmap.
>> 
>> At OR in Austin, I presented an indexing system (based partly on ideas from 
>> GSearch, but not on the GSearch codebase) that we at UVa are working on. One 
>> of the key principles of this system is that because discovery and 
>> presentation for repository contents are increasingly based on indexes, and 
>> because discovery and presentation are parts of curation (viewed broadly), 
>> it is worthwhile to move the configuration of indexing workflows inside the 
>> repository being indexed, so that indexing configuration "lives" alongside 
>> the indexed contents and can be managed through the same services. (In the 
>> example of our system, RELS-INT RDF connects metadata datastreams in 
>> indexable objects with indexer objects that contain indexing 
>> transformations.)
>> 
>> I'd like to propose that the roadmap for GSearch include the task of making 
>> it possible for users to move configuration for indexing transformations 
>> (_not_ necessarily configuration for the connections between indexes and 
>> repositories, but only the configuration of indexing transformations) 
>> _inside_ the repositories being indexed.
>> 
>> One key affordance that would become available would be to manage indexing 
>> transformations through the same APIs as are used for repository contents. 
>> Because changing an index transformation would no longer require altering 
>> material in the local GSearch install, but only the repository, all of the 
>> wonderful functionality that Fedora already supplies in of the core 
>> repository services would become available (e.g. XACML policy controls, 
>> metadata associations, a nice RESTful API, etc.). 
>> 
>> Doing this would require much careful thought as to how to model and 
>> structure representations of indexing transformations in the repository 
>> context, but it could have great benefits, as tools to manage indexing would 
>> be able to rely on work already done and in progress for the management of 
>> ordinary repository contents.
>> 
>> 
>> ---
>> A. Soroka
>> Online Library Environment
>> the University of Virginia Library
>> 
>> 
>> 
>> 
>> On Oct 12, 2011, at 10:07 AM, Gert Schmeltz Pedersen wrote:
>> 
>>> This message is meant to open for a discussion of the roadmap for GSearch. 
>>> It started in a small group, but we invite participation from the wider 
>>> group of fedora-developers. I copy this message to the fedora-users list so 
>>> that GSearch users are informed about the discussion, but to follow it 
>>> onwards and to contribute they have to subscribe to the fedora-developers 
>>> list.
>>> 
>>> I will initiate the discussion with a status. GSearch 2.2 has been the 
>>> current release since December 2008. At OR2011 in Austin in June 2011 I 
>>> presented a plan for development of GSearch, see 
>>> https://conferences.tdl.org/or/OR2011/OR2011main/paper/view/416/127 . 
>>> Following that, I have provided GSearch 2.3, and the official release is 
>>> near. You can get the source at https://github.com/fcrepo/gsearch and 
>>> fedoragsearch.war from the DTU prerelease site at 
>>> http://www.cvt.dk/fedoragsearch/ and see the documentation page at 
>>> http://miranth.cvt.dk/fedoragsearch/ .
>>> 
>>> Next step in the plan is to provide GSearch 2.4 by the end of the year. I 
>>> will use the issue tracker at 
>>> https://jira.duraspace.org/secure/IssueNavigator.jspa?mode=hide&requestId=10311
>>>  to track the work, and I invite your feedback and contributions. Potential 
>>> committers may be enrolled, I already had some responses to my invitation 
>>> to potential committers at OR2011. Some of you may have heard at OR2011, 
>>> that I will retire by the end of the year. However, I will continue 
>>> part-time to support GSearch users on the fedora-users list and continue to 
>>> develop for GSearch and Fedora in partnerships with people, who have an 
>>> interest in that.
>>> 
>>> The post-2.4 roadmap discussion can both be on this list and as new or 
>>> modified issues at the issue tracker. I think that members of the initial 
>>> small group will soon bring up issues.
>>> 
>>> Gert
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure contains a
>>> definitive record of customers, application performance, security
>>> threats, fraudulent activity and more. Splunk takes this data and makes
>>> sense of it. Business sense. IT sense. Common sense.
>>> http://p.sf.net/sfu/splunk-d2d-oct_______________________________________________
>>> Fedora-commons-developers mailing list
>>> fedora-commons-develop...@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
>> 
>> 
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> Fedora-commons-users mailing list
>> Fedora-commons-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
> 
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> http://p.sf.net/sfu/splunk-d2d-oct_______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2d-oct
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] [fcrepo-dev] GSearch planning

Reply via email to