(Note: this is a cross-posting in the fedora and dspace communities) Hello and Happy Belated Software Freedom Day! http:// www.softwarefreedomday.org/
I'm excited to recently be participating in efforts within our community to find new synergies between DSpace and Fedora. Likewise, the recent announcement by both Foundations that they are excited about working together is positive. I originally met in the spring with Chris Wilper, Dan Davis and Matthew Zumwalt of the Fedora Commons community at JA-SIG 2008 in St. Paul and had several very positive interactions with them about synergies between the two projects. I recall that during that discussion, the strongest point made was that DSpace and Fedora fill very different niches in the ecosystem of Digital Libraries and Content Management. I was excited to see the recognition of this on their side of the table as well. At that time we were just warming up for the DSpace GSoC 2008 selection process and had several interesting options come onto the table around this topic of interoperability between DSpace and Fedora. I recall at the time, it the original statement that it should be in the list came for either Jim Downing or Scott Yeadon back in the early spring of this year. My point being, that the roots of the latest endeavors are arising from multiple sources within our communities. But this said, I feel the recent announcements by both foundations to begin working collaboratively are now galvanizing this process. I look forward to participating in these new initiatives and hope others are interested as well. I think we are already starting to see the manifestation of this relationship today. There have been several projects within the DSpace community that have been going on this summer and I'd like to informally report on them. Google Summer of Code: Recently DSpace participated in the Google Summer of Code, we had four students working on various projects (http://wiki.dspace.org/ index.php/Google_Summer_of_Code). All these projects were very interesting and successfully completed by the students. The DSpace/ Fedora integration project (Andrius Blažinskas mentored by Richard Rodgers) and Semantic Web Enabling project (Peter Coetzee mentored by myself) both explored extremely positive possibilities around opening up the backend assetstore and metadata management services of DSpace to alternative solutions and allowing for the mapping and storage of DSpace Content and Metadata to Fedora Digital Objects and alternatively, its expression in RDF and exposure as Linked Open Data via SPARQL endpoints. DSpace Fedora Integration: Mapping DSpace Objects to Fedora Digital Object representations. http://wiki.dspace.org/index.php/ Google_Summer_of_Code_2008_Fedora_Integration At its heart the Fedora integration is an implementation of the current Data Access Object interfaces we have been working on for DSpace 2.0 and successfully represented DSpace Communities, Collections, Items and their Bitstream Contents as Fedora Objects with RELS-EXT relations between them and metadata expressed as XML datastreams. DSpace RDF Triplestore Integration: http://wiki.dspace.org/index.php/ Google_Summer_of_Code_2008_Fedora_Integration The recent activity around the Linked Open Data community and the Semantic Web led us to want to explore how the DSpace data model and metadata services could be implemented on top of an RDF triplestore and how those LoD Representations could be exposed to the world and made queriable via SPARQL endpoints. The outcome of the project allowed us to 1.) Produce an ontology for DSpace (http://purl.org/dspace/model) for serialization into RDF triple-stores and for querying via SW/LoD SPARQL endpoints. 2.) Implement a Storage layer for DSpace 2.0 that provided a mapping to RDF and Query Support capabilities via D2RQ mappings and Jena Triplestores. DSpace 2.0 Funded Development: This Fall I am a member of a funded team working to establish the foundations for a concrete DSpace 2.0 design and implementation. This includes, among various other topics, re-architecting the DSpace Data Model and aligning it with other platforms/standards for representing content in repositories. In my opinion, this represents an ideal opportunity to open discussions within both communities on how these tools can be better leveraged, their niches better defined, and best practices on their interoperability better established. Our initial activity around DSpace 2.0 has involved opening up the data model to support the better attachment of metadata to DSpace objects (Communities, Collections, Items, Bundles, Bitstreams) and also to support the attachment of "Content" more generically along the various objects within the model. For us this presents a new need for Common shared API/protocols/services that will allow for the configuration of extensible and stackable Content and Metadata Repository Interfaces in DSpace 2.0 and a more common shared representation of that Content+Metadata across such a heterogeneous environment of services. In researching this possibility for an implementation, I've reviewed the Fedora APIs and Object model, as well as those API found in Sword, JCR(170/283) and CMIS. What intrigues me the most about these projects above, is that the problem domain for enabling the exposure and management of content in tools such as DSpace and Fedora is actually a subset of a larger direction that the whole CMS industry/Sector is moving in with technologies and standards such as APP and CMIS. This is very promising for the future of our services as we will begin to see these initiatives begin to solve some of the problems we experience right now to heterogeneous parallel solutions to the same problem domain. External Technology Fronts: In my opinion, (1) the Linked Open Data, semantic web, RDF centric approach to describing the relationships between "Resources" and (2) exposing these Resource representations as the basis for a RESTful metadata service (with endpoints such as SQL2,JQOM ,SPARQL for query) will form the basis for the future interoperability of not only Fedora/DSpace tools, but also for these tools with two larger communities: A,) The Linked Open Data movement: Semantically link Data Web, Projects such as LoD, DCMI and Library of Congress exposure of metadata and classification registries. B.) The CMIS/JCR content repository standards communities that are manifesting common shared API and protocols for interacting more generically with Content Repositories. I'd be quite interested in seeing how these technologies will alter the Library community’s concepts and expectations behind registries and vocabulary. It can already be seen today in projects like the Library of Congress Standards & Research Data Values Registry http://www.loc.gov:8081/standards/registry/lists.html http://metadataregistry.org/ http://dcmi.kc.tsukuba.ac.jp/dcregistry/ I hope to see it begin to impact other registries as we begin to recognize (1) our metadata is content and (2) it is heavily interlinked. I hope to see the effects eventually percolate into projects such as the GDFR/Pronom Format Registry services. http://www.nationalarchives.gov.uk/PRONOM http://www.formatregistry.org/ I would be very excited to see common response formats (SPARQL/XML, RDF/XML, JSON) and common Query Syntaxes (SPARQL) on such data-sets in such a way as to allow the reuse of existing clients and popular technologies that can be more easily transferred to employees and across positions within the sector. Application to our Community: Interestingly, because of evolving in a grant funded research community, many of these projects perceive their work as competitive or opposing in nature. I perceive a revolution occurring, at least within our own small community of DSpace, where work is shifting away from this strategy and towards a more traditional Open Source / Open Community model which, while appearing altruistic at first, is actually self serving for those individuals and organizations that participate within it, allowing them to reduce and/or altogether eliminate this replication of effort and allow healthier synergies to evolve. It is becoming clearer that many of these projects have important and separate niches to fill within the larger sector. For instance, in the larger sector, JCR and CMIS are not competitive but complimentary. JCR = a API CMIS = a Protocol A JCR CMIS driver represents a opportunity to see how these are complementary and how they would serve both the JAVA community and the larger CMS community, Likewise, they provide a clear blueprint for API implementation in other languages as well as for vendors that wish to implementing such services. Applying the same blueprint in the Fedora/DSpace communities we see an opportunity for DSpace and Fedora Communities to clarify the roles of API and Protocols within them Applications: DSpace = a User Application (currently with its own internal CMS) Fedora = a User Application that is a CMS Protocols: Fedora API (SOAP/REST) = a SOAP/REST protocol for interacting with Fedora DSpace LNI = a SOAP/WEB-DAV protocol for interacting with DSpace Sword = a APP/REST protocol for interacting with DSpace, Fedora, etc The latter group represents a set of services with protocols that allow one to interact with a very "narrow niche" of CMS applications in the Open Repositories Community. I find this the interesting point where ventures with more commercial CMS protocols such as CMIS may become the future direction of, should we be considering that CMIS is a super-set of what Sword is going to give us? I feel there are abstractions that will become more salient as we explore the collaborative space between DSpace and Fedora. A.) Identification: I was recently intrigued to find out that Fedora can support the identification of its ID's using URI other than the "info:fedora" scheme:namespace. http://www.fedora.info/definitions/identifiers/ http://oxfordrepo.blogspot.com/2008/01/conclusions-on-uuids-and-local- ids-in.html I think we are starting to recognize that identification of resources within a CMS needs to be pluralistic in nature. that there are so many efforts to establish naming/identification efforts within the community B.) Relationships (Ontology) I'm intrigued think that this capability, and the usage of the RELS- EXT RDF datastream as a holding place for dspace relationships/ metadata found in the dspace 2.0 Ontology allows a clean alignment between DSpace Resources (Objects) and Fedora Objects and will hold interesting possibilities for bringing together DSpace services on top of Fedora Repositories. DSpace and Fedora RELS-EXT Ontologies offer an opportunity for making such above relationships more explicit. Currently, the RELS-EXT ontology is a closed ontology that does not extend upon known common Properties that already exist in the RDF community today, I can see a serious benefit in making the Properties int he RELS-EXT ontology be made subProperties of dcterms:isPartOf / dcterms:hasPart. This sort of alignment would start these ontologies down the road to alignment as the DSpace Ontology already uses them. Likewise, if there is any benefit to using other shared ontologies (OAI-ORE for instance, as the basis for expressing relations that represent aggregations/ containership. I see many further opportunities to begin to draw equivalencies in this area. Alignments in ontology between DSpace and Fedora will allow us to begin to utilize inference capabilities in RDF triplestores such as Mulgara to be able to retrieve equivalent statements made in the two ontologies in a common ontological expression (DC, DCTERMS, ORE, BIBO, etc) rather than two separate application specific ontologies. Conclusion: In closing, I'll just reiterate that we have great opportunities now for synergy where before there were barriers. I hope that we will see this reflected in the conferences that we attend, attending the JA-SIG conference last Spring opened my eyes to how the community can benefit when developers from different projects are all placed in a room together to talk about the technologies they use. I think this would make for an excellent format for Open Repositories conferences and would benefit our newly formed foundations and communities immensely. Last year we saw Bar-Camps becoming popular and happening at Open Repositories, it would be very exciting to see more opportunities for "Un-Conferences" where we can bring together development teams from both communities in a more formal setting to promote collaboration. Cheers, Mark Diggory p.s. This post represents my initial foray into the Blogging world, I hope to link-back any exciting conversation threads that may arise around it on my new blog. http://purl.org/net/mdiggory/blog ~~~~~~~~~~~~~ Mark R. Diggory - DSpace Developer and Systems Manager MIT Libraries, Systems and Technology Services Massachusetts Institute of Technology Home Page: http://purl.org/net/mdiggory/homepage Blog: http://purl.org/net/mdiggory/blog ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech