[ 
https://jira.duraspace.org/browse/DS-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=25377#comment-25377
 ] 

DSpace @ Lyncode edited comment on DS-1202 at 7/2/12 2:35 AM:
--------------------------------------------------------------

Hi Mark Diggory,

1) Yes. The actual oai interface (OAICAT based) has some issues that cannot be 
solved easily, mainly:

- Doesn't support virtual contexts (this is the major development in XOAI), for 
interoperability that concerns guide-lines from Driver and OpenAIRE, simply, 
the actual oai is not enough. Driver and OpenAIRE have specific metadata 
(values) formatting requirements that are sightly distinct from each other, it 
is incorrect (OAI-PMH protocol) to have one only interface that returns 
metadata in different formats (ie. date format, prefixed values, sufixed 
values, and so on), one must have distinct interfaces that outputs metadata 
with the desired values format.

- OAICAT makes some incorrect assumptions (with respect to the OAI-PMH 
protocol): https://jira.duraspace.org/browse/DS-1195

2) The new solr core is because of the "addon development policy", but it could 
use the search core, but one need the search core to answer a specific query, 
more properly:

- Which items does have all its bitstreams free up to download?

Another thing (i didn't look at the actual search core implementation, only the 
schema.xml), does it indexes in all metadata fields? Even the user created 
ones? This is an important requirement for this OAI implementation.

3 & 4) The XOAI architecture it's based on the idea that all OAI data providers 
have the same core functionality, that is, they implement the OAI-PMH protocol 
[1]. So one just need to implement the DSpace specific datasource, XOAI core do 
the rest. So a spring based solution it's a possibility, but i think this 
approach (used by OAICAT also) is a good way of keeping things simple. 
Considering future OAI-PMH protocol versions, this approach also seems to be 
the best, one just need to update the core library to reflect those changes.

--- Getting into the core

XOAI uses a 2-phase pipeline transformation (configurable) using XSL:

1 - Metadata values transformation (none, driver, openaire, ...)
2 - Metadata schema transformation (oai_dc, mets, didl, ...)

-- XSLT Input

This XSL transformers receive as input a XML file that uses a specific (and 
flexible) schema (allowing us to output any kind of information - DSpace 
datasource implementation) [attached XSD].

-- Data Sources

The DSpace data source is a specific datasource. One must provide access to all 
the needed OAI-PMH information:

> Repository Name, Email (Identify)
> Communities & Collections (ListSets)
> Items (ListRecords & ListIdentifiers)

-- Configuration

XOAI provides the concept of Filter, that is, one could associate filters with 
sets, and when requested (set=<setSpec>) it triggers the use of those filters 
resulting in a specific datasource query (Filters are also specific class 
implementations that extends the AbstractFilter class, specific DSpace Filters 
could be found at [3]). Filters could also be associated with metadata formats 
and contexts.

The actual configuration could be found at [4] (spring based concept)

--- Resources

[1] http://www.openarchives.org/OAI/openarchivesprotocol.html
[2] 
https://github.com/lyncode/xoai-common/blob/master/src/main/java/com/lyncode/xoai/common/dataprovider/filter/AbstractFilter.java
[3] 
https://github.com/lyncode/DSpace/tree/dspace-with-xoai/dspace-xoai/dspace-xoai-api/src/main/java/org/dspace/xoai/filter
[4] 
https://github.com/lyncode/DSpace/blob/dspace-with-xoai/dspace/config/modules/xoai/xoai.xml


PS - I would like to discuss with you a specific OpenAIRE requirement. OpenAIRE 
is aware of the embargo end date, but dublin core does not provide a specific 
field for this one. The embargo DSpace feature, for example, requires the user 
to define such field. I think it's important for the DSpace community to, 
somehow, have more "control" over the possible metadata fields, just giving a 
shot... why not produce (like OAI-PMH > oai_dc) a specific (DC extesion) 
schema? DSpace development is limited (by default) to the DC Schema 
information, which i think, represents a huge limitation (DC is getting 
older... and there are some needs that could be fulfilled with the extesion of 
the DC).
                
      was (Author: lyncode):
    Hi Mark Diggory,

1) Yes. The actual oai interface (OAICAT based) has some issues that cannot be 
solved, mainly:

- Doesn't support virtual contexts (this is the major development in XOAI), for 
interoperability that concerns guide-lines from Driver and OpenAIRE, simply, 
the actual oai is not enough. Driver and OpenAIRE have specific metadata 
(values) formatting requirements that are sightly distinct from each other, it 
is incorrect (OAI-PMH protocol) to have one only interface that returns 
metadata in different formats (ie. date format, prefixed values, sufixed 
values, and so on), one must have distinct interfaces that outputs metadata 
with the desired values format.

- OAICAT makes some incorrect assumptions (with respect to the OAI-PMH 
protocol): https://jira.duraspace.org/browse/DS-1195

2) The new solr core is because of the "addon development policy", but it could 
use the search core, but one need the search core to answer a specific query, 
more properly:

- Which items does have all its bitstreams free up to download?

Another thing (i didn't look at the actual search core implementation, only the 
schema.xml), does it indexes in all metadata fields? Even the user created 
ones? This is an important requirement for this OAI implementation.

3 & 4) The XOAI architecture it's based on the idea that all OAI data providers 
have the same core functionality, that is, they implement the OAI-PMH protocol 
[1]. So one just need to implement the DSpace specific datasource, XOAI core do 
the rest. So a spring based solution it's a possibility, but i think this 
approach (used by OAICAT also) is a good way of keeping things simple. 
Considering future OAI-PMH protocol versions, this approach also seems to be 
the best, one just need to update the core library to reflect those changes.

--- Getting into the core

XOAI uses a 2-phase pipeline transformation (configurable) using XSL:

1 - Metadata values transformation (none, driver, openaire, ...)
2 - Metadata schema transformation (oai_dc, mets, didl, ...)

-- XSLT Input

This XSL transformers receive as input a XML file that uses a specific (and 
flexible) schema (allowing us to output any kind of information - DSpace 
datasource implementation) [attached XSD].

-- Data Sources

The DSpace data source is a specific datasource. One must provide access to all 
the needed OAI-PMH information:

> Repository Name, Email (Identify)
> Communities & Collections (ListSets)
> Items (ListRecords & ListIdentifiers)

-- Configuration

XOAI provides the concept of Filter, that is, one could associate filters with 
sets, and when requested (set=<setSpec>) it triggers the use of those filters 
resulting in a specific datasource query (Filters are also specific class 
implementations that extends the AbstractFilter class, specific DSpace Filters 
could be found at [3]). Filters could also be associated with metadata formats 
and contexts.

The actual configuration could be found at [4] (spring based concept)

--- Resources

[1] http://www.openarchives.org/OAI/openarchivesprotocol.html
[2] 
https://github.com/lyncode/xoai-common/blob/master/src/main/java/com/lyncode/xoai/common/dataprovider/filter/AbstractFilter.java
[3] 
https://github.com/lyncode/DSpace/tree/dspace-with-xoai/dspace-xoai/dspace-xoai-api/src/main/java/org/dspace/xoai/filter
[4] 
https://github.com/lyncode/DSpace/blob/dspace-with-xoai/dspace/config/modules/xoai/xoai.xml


PS - I would like to discuss with you a specific OpenAIRE requirement. OpenAIRE 
is aware of the embargo end date, but dublin core does not provide a specific 
field for this one. The embargo DSpace feature, for example, requires the user 
to define such field. I think it's important for the DSpace community to, 
somehow, have more "control" over the possible metadata fields, just giving a 
shot... why not produce (like OAI-PMH > oai_dc) a specific (DC extesion) 
schema? DSpace development is limited (by default) to the DC Schema 
information, which i think, represents a huge limitation (DC is getting 
older... and there are some needs that could be fulfilled with the extesion of 
the DC).
                  
> DSpace XOAI Data Provider
> -------------------------
>
>                 Key: DS-1202
>                 URL: https://jira.duraspace.org/browse/DS-1202
>             Project: DSpace
>          Issue Type: New Feature
>          Components: OAI-PMH
>            Reporter: DSpace @ Lyncode
>            Priority: Major
>              Labels: oai
>
> DSpace XOAI Data Provider is an OAI-PMH Interface for DSpace based upon XOAI 
> (OAI-PMH java toolkit). With the following characteristics:
> - OpenAIRE compliant
> - Driver compliant
> - Default context (same behavior as the original DSpace OAI interface)
> - Completely configurable
> - Fast (based on solr, also with cache)
> - Extendable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to