Hi Michael,

impressive detective work!

Maybe an additional piece of the puzzle: I don't think that default DSpace
exposes the direct bitstream links in OAI-PMH.

One piece of code where I know this has been added, is in the RIOXX patch:
https://github.com/atmire/RIOXX57/blob/master/dspace/modules/additions/src/main/java/org/dspace/xoai/util/ItemUtils.java#L219

The RIOXX patch in itself is only compatible with DSpace 5.x, a DSpace 6.x
compatible version of the patch has yet to be made.

Maybe what you found is in effect, an incompatibility between the RIOXX
patch for DSpace 5, and DSpace 6, and will potentially affect everyone with
the RIOXX patch attempting DSpace upgrades to 6?

best regards,

Bram

[image: logo] Bram Luyten
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, 3001 Leuven, Belgium
atmire.com
<http://atmire.com/website/?q=services&utm_source=emailfooter&utm_medium=email&utm_campaign=braml>


On Fri, 28 Feb 2020 at 10:28, Michael White <[email protected]>
wrote:

> Thanks Bram,
>
>
>
> > Conclusion: Can't reproduce
>
>
>
> I’ve had a poke about in those 2 repositories, and I’ve also not been able
> to reproduce the issue in either repository (but my investigation wasn’t
> particularly exhaustive!) – having said that, the issue is with the link to
> the bitstream in the OAI-PMH output, but, as far as I could tell, the link
> to the bitstream isn’t included in the OAI-PMH output from either of those
> repositories . . . (?)
>
>
>
> However, I did some further investigation in to the issue I’m seeing with
> our repository last night and believe that I have identified a bug (at
> least that is what it looks like to me!).
>
>
>
> Firstly I note that the format of the link to the bitstream that appears
> on the Item View page in our Repository has 2 distinct forms – e.g.:
>
>
>
> https://dspace.stir.ac.uk/handle/1893/58 has a bitstream link of the
> form: https://dspace.stir.ac.uk/bitstream/1893/58/1/Thesis.pdf
>
> https://dspace.stir.ac.uk/handle/1893/30142 has a bitstream link of the
> form:
> https://dspace.stir.ac.uk/retrieve/17570e9c-aa29-4c15-99b2-af5892853652/Revisions_Final_Chronic_wounds.pdf
>
>
>
> - however, for the latter, the bitstream link that appears in the OAI-PMH (
> https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=oai_dc)
> has the form:
> http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf
>
>
>
> - i.e. the URL for this bitstream is being rendered with a URL of the
> “wrong” format.
>
>
>
> The key observation here is the Sequence ID (sid) that appears in that
> URL, has the value “-1” (which is a “non-value”) – and looking in the
> database, I can see that we have 3390 bitstreams with a sid value of “-1” –
> I can’t be 100% certain, but I’m guessing this coincides with the records
> that have been added to the system since we upgraded from v4 to v6.2 a
> couple of years ago . . . (?)
>
>
>
> So, the bug . . .
>
>
>
> Looking at the code that renders the bitstream link in the JSPUI (in
> dspace-6.2-src-release/dspace-jspui/src/main/java/org/dspace/app/webui/jsptag/ItemTag.java,
> line 1054), I can see that it first checks the sid – if it is > 0 then the
> link is rendered in the first format, but if it is <= 0 (i.e. “-1”), then
> the link is rendered in the second format:
>
>
>
> if ((handle != null) && (b.getSequenceID() > 0)) {
>
>                 bsLink = bsLink + "/bitstream/"
>
>                                                 + item.getHandle() + "/"
>
>                                                 + b.getSequenceID() + "/";
>
> } else {
>
>                 bsLink = bsLink + "/retrieve/"
>
>                                                 + b.getID() + "/";
>
> }
>
>
>
> However, it looks to me like the code that renders the bitstream links for
> inclusion in OAI-PMH output (in
> dspace-6.2-src-release/dspace-oai/src/main/java/org/dspace/xoai/util/ItemUtils.java,
> line 211) doesn’t do this check and simply renders the link in the first
> format regardless:
>
>
>
> if (handle != null && baseUrl != null) {
>
>                 url = baseUrl + "/bitstream/"
>
>                                                 + handle + "/"
>
>                                                 + sid + "/"
>
>                                                 + URLUtils.encode(bsName);
>
> }
>
>
>
> Therefore, my current thought is that if I replace the if statement above
> with:
>
>
>
> if (handle != null && baseUrl != null)
>
> // Updated code to handle both SID and UUID type bitstream URLs - MW:
> 27/2/20
>
> {
>
>                 if (bit.getSequenceID() > 0) {
>
>                                 url = baseUrl + "/bitstream/"
>
>                                                 + handle + "/"
>
>                                                 + sid + "/"
>
>                                                 + URLUtils.encode(bsName);
>
>    } else {
>
>                                 url = baseURL + "/retrieve/"
>
>                                                 + bit.getID() + "/"
>
>                                                 + URLUtils.encode(bsName);
>
>    }
>
> }
>
>
>
> - then the URLs that are rendered in the OAI-PMH output should be correct
> for both cases.
>
>
>
> My next step is to try applying this fix in our DEV system and see if it
> works as I expect, but I’d be interested to know if others agree with my
> analysis (and proposed fix), or if I’ve missed anything, or I’m
> proposing/doing anything daft!
>
>
>
> Cheers,
>
>
>
> Mike
>
>
>
>
> *Michael White Senior Developer*
>
>
> *Business Applications and Integrations Information Services*
>
>
> 4B19, Cottrell
>
> University of Stirling
> Stirling
> FK9 4LA
>
> Tel:  +44 (0)1786 466877
> Email:  [email protected]
> Web: stir.ac.uk/informationservices
> <http://www.stir.ac.uk/informationservices>
>
> <https://www.facebook.com/stirlinglibrary/>
> <https://twitter.com/isstirling> <https://www.instagram.com/isstirling/>
> <https://www.youtube.com/user/infoservicesatstir>
>
> [image: Banner] <https://www.stir.ac.uk/>
>
>
>
>
>
> *From:* Bram Luyten <[email protected]>
> *Sent:* 28 February 2020 08:40
> *To:* Michael White <[email protected]>
> *Cc:* DSpace Tech <[email protected]>
> *Subject:* Re: [dspace-tech] Incorrect bitstream URLs in OAI-PMH output?
>
>
>
> Hi Michael,
>
>
>
> thank you for reporting/sharing this.
>
>
>
> Not a solution, but I wanted to share two observations to narrow the
> problem down.
>
>
>
> *DSpace 6.3 XMLUI - Fresh install*
>
>
>
> Item link: https://repository.openpolytechnic.ac.nz/handle/11072/128
>
> Bitstream link:
> https://repository.openpolytechnic.ac.nz/bitstream/handle/11072/128/Curry_2002%20-%20Working%20Papers%20-%20res_wp602curryl.pdf?sequence=1&isAllowed=y
>
>
>
> Conclusion: Can't reproduce
>
>
>
> *DSpace 6.3 XMLUI - Upgraded instance & item that already exists
> pre-upgrade*
>
>
>
> Item link: https://ramscholar.dspace-express.com/handle/10675.1/96
>
> Bitstream link:
> https://ramscholar.dspace-express.com/bitstream/handle/10675.1/96/sept%2011%2c%202006.pdf?sequence=1&isAllowed=y
>
>
>
> Conclusion: Can't reproduce
>
>
>
> Both of these installations have OAI enabled but I didn't have the time to
> look at the record there
>
>
> https://repository.openpolytechnic.ac.nz/oai/request?verb=ListRecords&metadataPrefix=oai_dc
>
>
> https://ramscholar.dspace-express.com/oai/request?verb=ListRecords&metadataPrefix=oai_dc
>
>
>
> Hope this helps!! Would be interested in learning whether this is specific
> to your institution/customization, JSPUI specific, ... as it may affect
> others as well !!
>
>
>
> with kindest regards,
>
>
>
> Bram
>
>
> [image: logo]
>
> *Bram Luyten*
> *250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586*
> *Gaston Geenslaan 14, 3001 Leuven, Belgium*
> atmire.com
> <http://atmire.com/website/?q=services&utm_source=emailfooter&utm_medium=email&utm_campaign=braml>
>
>
>
>
>
> On Thu, 27 Feb 2020 at 12:52, Michael White <[email protected]>
> wrote:
>
> Hi,
>
>
>
> We’re using DSpace v6.2, JSPUI.
>
>
>
> Whilst troubleshooting an issue with a large number of broken full text
> links harvested from our repository via OAI-PMH by the CORE service, CORE
> reported to us that "the provided full text link in the OAI-PMH
> dc:identifier field is broken."
>
>
>
> For example, for this item in our repository:
>
>
>
> https://dspace.stir.ac.uk/handle/1893/30142
>
>
>
> - the link to the associated bitstream from this repository record is:
>
>
>
>
> https://dspace.stir.ac.uk/retrieve/17570e9c-aa29-4c15-99b2-af5892853652/Revisions_Final_Chronic_wounds.pdf
>
>
>
> - however, if harvested via OAI-PMH:
>
>
>
>
> https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=oai_dc
>
>
>
> - then the bitstream link in dc.identifier is wrong:
>
>
>
> <dc:identifier>
> http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf
> </dc:identifier>
>
>
>
> - i.e. it contains "-1" where I'd expect to see the bitstream UUID.
>
>
>
> And looking at the "raw" XOAI output, it appears to be wrong there too (so
> not an issue with the oai_dc crosswalk?):
>
>
>
>
> https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=xoai
>
>
>
> <field name="url">
> http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf
> </field>
>
>
>
> However, a large number of the OAI-PMH bitstream links do work - e.g.:
>
>
>
>
> https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/58&metadataPrefix=oai_dc
>
>
>
> - includes the correct bitstream URL:
>
>
>
> <dc:identifier>http://dspace.stir.ac.uk/bitstream/1893/58/1/Thesis.pdf
> </dc:identifier>
>
>
>
> I've tried clearing the cache, and rebuilding the OAI-PMH index, but this
> issue remains. I also searched the Mailing list archives and JIRA, but
> couldn't find anything that seemed to relate to this problem.
>
>
>
> I'm not sure, but my current working theory is that links to "older"
> bitstreams do work because they relate to records added to the repository
> before the upgrade that moved DSpace from using numeric IDs to UUIDs - but
> records added since then, that make use of UUIDs, don't work . . . . (but I
> haven't managed to prove this theory yet!).
>
>
>
> Has anyone else come across this? Does anyone know of a solution (I'm
> happy to hack code/apply patches if required)?
>
>
>
> If you're on this version of DSpace, are all the bitstream URLs harvested
> via OAI-PMH from your repository correct?
>
>
>
> If anyone has any fixes, thoughts, observations etc, they would be most
> welcome as I'm currently at a loss as to how to resolve this and, given the
> importance of CORE for supporting the upcoming REF here in the UK, my
> library colleagues are getting a bit jumpy ;-).
>
>
>
> Cheers,
>
>
>
> Mike
>
>
>
>
> *Michael White Senior Developer*
>
>
> *Business Applications and Integrations Information Services*
>
>
> 4B19, Cottrell
>
> University of Stirling
> Stirling
> FK9 4LA
>
> Tel:  +44 (0)1786 466877
> Email:  [email protected]
> Web: stir.ac.uk/informationservices
> <http://www.stir.ac.uk/informationservices>
>
> <https://www.facebook.com/stirlinglibrary/>
> <https://twitter.com/isstirling> <https://www.instagram.com/isstirling/>
> <https://www.youtube.com/user/infoservicesatstir>
>
> [image: Banner] <https://www.stir.ac.uk/>
>
>
>
>
> ------------------------------
>
> The University achieved an overall 5 stars in the QS World University
> Rankings 2018
>
> The University of Stirling is a charity registered in Scotland, number SC
> 011159.
>
> --
> All messages to this mailing list should adhere to the DuraSpace Code of
> Conduct: https://duraspace.org/about/policies/code-of-conduct/
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-tech/AM6PR03MB5560135CD01E0191E3F1C1C0D4EB0%40AM6PR03MB5560.eurprd03.prod.outlook.com
> <https://groups.google.com/d/msgid/dspace-tech/AM6PR03MB5560135CD01E0191E3F1C1C0D4EB0%40AM6PR03MB5560.eurprd03.prod.outlook.com?utm_medium=email&utm_source=footer>
> .
>
> ------------------------------
> The University achieved an overall 5 stars in the QS World University
> Rankings 2018
> The University of Stirling is a charity registered in Scotland, number SC
> 011159.
>

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/CACwo3X0vxW%3Df6DaMqaaJYxnLcqA2dKD9sDXQSMySYuWnijDzPg%40mail.gmail.com.

Reply via email to