[Dspace-tech] Happy Belated Software Freedom Day! (or Collaborations between Fedora and DSpace ...)

Mark Diggory Sun, 21 Sep 2008 14:44:03 -0700

(Note: this is a cross-posting in the fedora and dspace communities)

Hello and Happy Belated Software Freedom Day! http:// 
www.softwarefreedomday.org/


I'm excited to recently be participating in efforts within our  
community to find new synergies between DSpace and Fedora. Likewise,  
the recent announcement by both Foundations that they are excited  
about working together is positive.

I originally met in the spring with Chris Wilper, Dan Davis and  
Matthew Zumwalt of the Fedora Commons community at JA-SIG 2008 in St.  
Paul and had several very positive interactions with them about  
synergies between the two projects. I recall that during that  
discussion, the strongest point made was that DSpace and Fedora fill  
very different niches in the ecosystem of Digital Libraries and  
Content Management. I was excited to see the recognition of this on  
their side of the table as well.

At that time we were just warming up for the DSpace GSoC 2008  
selection process and had several interesting options come onto the  
table around this topic of interoperability between DSpace and  
Fedora.  I recall at the time, it the original statement that it  
should be in the list came for either Jim Downing or Scott Yeadon  
back in the early spring of this year.  My point being, that the  
roots of the latest endeavors are arising from multiple sources  
within our communities. But this said, I feel the recent  
announcements by both foundations to begin working collaboratively  
are now galvanizing this process.  I look forward to participating in  
these new initiatives and hope others are interested as well.

I think we are already starting to see the manifestation of this  
relationship today.  There have been several projects within the  
DSpace community that have been going on this summer and I'd like to  
informally report on them.


Google Summer of Code:

Recently DSpace participated in the Google Summer of Code, we had  
four students working on various projects (http://wiki.dspace.org/  
index.php/Google_Summer_of_Code). All these projects were very  
interesting and successfully completed by the students.  The DSpace/ 
Fedora integration project (Andrius Blažinskas mentored by Richard  
Rodgers) and Semantic Web Enabling project (Peter Coetzee mentored by  
myself) both explored extremely positive possibilities around opening  
up the backend assetstore and metadata management services of DSpace  
to alternative solutions and allowing for the mapping and storage of  
DSpace Content and Metadata to Fedora Digital Objects and  
alternatively, its expression in RDF and exposure as Linked Open Data  
via SPARQL endpoints.


DSpace Fedora Integration:

Mapping DSpace Objects to Fedora Digital Object representations.
http://wiki.dspace.org/index.php/ 
Google_Summer_of_Code_2008_Fedora_Integration

At its heart the Fedora integration is an implementation of the  
current Data Access Object interfaces we have been working on for  
DSpace 2.0 and successfully represented DSpace Communities,  
Collections, Items and their Bitstream Contents as Fedora Objects  
with RELS-EXT relations between them and metadata expressed as XML  
datastreams.


DSpace RDF Triplestore Integration:

http://wiki.dspace.org/index.php/ 
Google_Summer_of_Code_2008_Fedora_Integration

The recent activity around the Linked Open Data community and the  
Semantic Web led us to want to explore how the DSpace data model and  
metadata services could be implemented on top of an RDF triplestore  
and how those LoD Representations could be exposed to the world and  
made queriable via SPARQL endpoints. The outcome of the project  
allowed us to

1.) Produce an ontology for DSpace (http://purl.org/dspace/model) for  
serialization into RDF triple-stores and for querying via SW/LoD  
SPARQL endpoints.

2.) Implement a Storage layer for DSpace 2.0 that provided a mapping  
to RDF and Query Support capabilities via D2RQ mappings and Jena  
Triplestores.


DSpace 2.0 Funded Development:

This Fall I am a member of a funded team working to establish the  
foundations for a concrete DSpace 2.0 design and implementation. This  
includes, among various other topics, re-architecting the DSpace Data  
Model and aligning it with other platforms/standards for representing  
content in repositories. In my opinion, this represents an ideal  
opportunity to open discussions within both communities on how these  
tools can be better leveraged, their niches better defined, and best  
practices on their interoperability better established.

Our initial activity around DSpace 2.0 has involved opening up the  
data model to support the better attachment of metadata to DSpace  
objects (Communities, Collections, Items, Bundles, Bitstreams) and  
also to support the attachment of "Content" more generically along  
the various objects within the model. For us this presents a new need  
for Common shared API/protocols/services that will allow for the  
configuration of extensible and stackable Content and Metadata  
Repository Interfaces in DSpace 2.0 and a more common shared  
representation of that Content+Metadata across such a heterogeneous  
environment of services. In researching this possibility for an  
implementation, I've reviewed the Fedora APIs and Object model, as  
well as those API found in Sword, JCR(170/283) and CMIS.

What intrigues me the most about these projects above, is that the  
problem domain for enabling the exposure and management of content in  
tools such as DSpace and Fedora is actually a subset of a larger  
direction that the whole CMS industry/Sector is moving in with  
technologies and standards such as APP and CMIS. This is very  
promising for the future of our services as we will begin to see  
these initiatives begin to solve some of the problems we experience  
right now to heterogeneous parallel solutions to the same problem  
domain.


External Technology Fronts:

In my opinion, (1) the Linked Open Data, semantic web, RDF centric  
approach to describing the relationships between "Resources" and (2)  
exposing these Resource representations as the basis for a RESTful  
metadata service (with endpoints such as SQL2,JQOM ,SPARQL for query)  
will form the basis for the future interoperability of not only  
Fedora/DSpace tools, but also for these tools with two larger  
communities:

A,) The Linked Open Data movement: Semantically link Data Web,  
Projects such as LoD, DCMI and Library of Congress exposure of  
metadata and classification registries.

B.) The CMIS/JCR content repository standards communities that are  
manifesting common shared API and protocols for interacting more  
generically with Content Repositories.

I'd be quite interested in seeing how these technologies will alter  
the Library community’s concepts and expectations behind registries  
and vocabulary. It can already be seen today in projects like the  
Library of Congress Standards & Research Data Values Registry

http://www.loc.gov:8081/standards/registry/lists.html
http://metadataregistry.org/
http://dcmi.kc.tsukuba.ac.jp/dcregistry/

I hope to see it begin to impact other registries as we begin to  
recognize (1) our metadata is content and (2) it is heavily  
interlinked. I hope to see the effects eventually percolate into  
projects such as the GDFR/Pronom Format Registry services.

http://www.nationalarchives.gov.uk/PRONOM
http://www.formatregistry.org/

I would be very excited to see common response formats (SPARQL/XML,  
RDF/XML, JSON) and common Query Syntaxes (SPARQL) on such data-sets  
in such a way as to allow the reuse of existing clients and popular  
technologies that can be more easily transferred to employees and
across positions within the sector.


Application to our Community:

Interestingly, because of evolving in a grant funded research  
community, many of these projects perceive their work as competitive  
or opposing in nature. I perceive a revolution occurring, at least  
within our own small community of DSpace, where work is shifting away  
from this strategy and towards a more traditional Open Source / Open  
Community model which, while appearing altruistic at first, is  
actually self serving for those individuals and organizations that  
participate within it, allowing them to reduce and/or altogether  
eliminate this replication of effort and allow healthier synergies to  
evolve.

It is becoming clearer that many of these projects have important and  
separate niches to fill within the larger sector. For instance, in  
the larger sector, JCR and CMIS are not competitive but complimentary.

JCR = a API
CMIS = a Protocol

A JCR CMIS driver represents a opportunity to see how these are  
complementary and how they would serve both the JAVA community and  
the larger CMS community, Likewise, they provide a clear blueprint  
for API implementation in other languages as well as for vendors that  
wish to implementing such services.

Applying the same blueprint in the Fedora/DSpace communities we see  
an opportunity for DSpace and Fedora Communities to clarify the roles  
of API and Protocols within them

Applications:
DSpace = a User Application (currently with its own internal CMS)
Fedora = a User Application that is a CMS

Protocols:
Fedora API (SOAP/REST) = a SOAP/REST protocol for interacting with  
Fedora
DSpace LNI = a SOAP/WEB-DAV protocol for interacting with DSpace
Sword = a APP/REST protocol for interacting with DSpace, Fedora, etc

The latter group represents a set of services with protocols that  
allow one to interact with a very "narrow niche" of CMS applications  
in the Open Repositories Community. I find this the interesting point  
where ventures with more commercial CMS protocols such as CMIS may  
become the future direction of, should we be considering that CMIS is  
a super-set of what Sword is going to give us?

I feel there are abstractions that will become more salient as we  
explore the collaborative space between DSpace and Fedora.

A.) Identification:

I was recently intrigued to find out that Fedora can support the  
identification of its ID's using URI other than the "info:fedora"  
scheme:namespace.

http://www.fedora.info/definitions/identifiers/
http://oxfordrepo.blogspot.com/2008/01/conclusions-on-uuids-and-local- 
ids-in.html

I think we are starting to recognize that identification of resources  
within a CMS needs to be pluralistic in nature. that there are so  
many efforts to establish naming/identification efforts within the  
community

B.) Relationships (Ontology)

I'm intrigued think that this capability, and the usage of the RELS- 
EXT RDF datastream as a holding place for dspace relationships/  
metadata found in the dspace 2.0 Ontology allows a clean alignment  
between DSpace Resources (Objects) and Fedora Objects and will hold  
interesting possibilities for bringing together DSpace services on  
top of Fedora Repositories.

DSpace and Fedora RELS-EXT Ontologies offer an opportunity for making  
such above relationships more explicit. Currently, the RELS-EXT  
ontology is a closed ontology that does not extend upon known common  
Properties that already exist in the RDF community today, I can see a  
serious benefit in making the Properties int he RELS-EXT ontology be  
made subProperties of dcterms:isPartOf / dcterms:hasPart. This sort  
of alignment would start these ontologies down the road to alignment  
as the DSpace Ontology already uses them. Likewise, if there is any  
benefit to using other shared ontologies (OAI-ORE for instance, as  
the basis for expressing relations that represent aggregations/  
containership. I see many further opportunities to begin to draw  
equivalencies in this area.

Alignments in ontology between DSpace and Fedora will allow us to  
begin to utilize inference capabilities in RDF triplestores such as  
Mulgara to be able to retrieve equivalent statements made in the two  
ontologies in a common ontological expression (DC, DCTERMS, ORE,  
BIBO, etc) rather than two separate application specific ontologies.


Conclusion:

In closing, I'll just reiterate that we have great opportunities now  
for synergy where before there were barriers.  I hope that we will  
see this reflected in the conferences that we attend, attending the  
JA-SIG conference last Spring opened my eyes to how the community can  
benefit when developers from different projects are all placed in a  
room together to talk about the technologies they use.  I think this  
would make for an excellent format for Open Repositories conferences  
and would benefit our newly formed foundations and communities  
immensely.  Last year we saw Bar-Camps becoming popular and happening  
at Open Repositories, it would be very exciting to see more  
opportunities for "Un-Conferences" where we can bring together  
development teams from both communities in a more formal setting to  
promote collaboration.

Cheers,
Mark Diggory

p.s. This post represents my initial foray into the Blogging world, I  
hope to link-back any exciting conversation threads that may arise  
around it on my new blog. http://purl.org/net/mdiggory/blog


~~~~~~~~~~~~~
Mark R. Diggory - DSpace Developer and Systems Manager
MIT Libraries, Systems and Technology Services
Massachusetts Institute of Technology
Home Page: http://purl.org/net/mdiggory/homepage
Blog: http://purl.org/net/mdiggory/blog

  
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

[Dspace-tech] Happy Belated Software Freedom Day! (or Collaborations between Fedora and DSpace ...)

Reply via email to