Thanks Bram,

> Conclusion: Can't reproduce

I’ve had a poke about in those 2 repositories, and I’ve also not been able to 
reproduce the issue in either repository (but my investigation wasn’t 
particularly exhaustive!) – having said that, the issue is with the link to the 
bitstream in the OAI-PMH output, but, as far as I could tell, the link to the 
bitstream isn’t included in the OAI-PMH output from either of those 
repositories . . . (?)

However, I did some further investigation in to the issue I’m seeing with our 
repository last night and believe that I have identified a bug (at least that 
is what it looks like to me!).

Firstly I note that the format of the link to the bitstream that appears on the 
Item View page in our Repository has 2 distinct forms – e.g.:

https://dspace.stir.ac.uk/handle/1893/58 has a bitstream link of the form: 
https://dspace.stir.ac.uk/bitstream/1893/58/1/Thesis.pdf
https://dspace.stir.ac.uk/handle/1893/30142 has a bitstream link of the form: 
https://dspace.stir.ac.uk/retrieve/17570e9c-aa29-4c15-99b2-af5892853652/Revisions_Final_Chronic_wounds.pdf

- however, for the latter, the bitstream link that appears in the OAI-PMH 
(https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=oai_dc)
 has the form: 
http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf

- i.e. the URL for this bitstream is being rendered with a URL of the “wrong” 
format.

The key observation here is the Sequence ID (sid) that appears in that URL, has 
the value “-1” (which is a “non-value”) – and looking in the database, I can 
see that we have 3390 bitstreams with a sid value of “-1” – I can’t be 100% 
certain, but I’m guessing this coincides with the records that have been added 
to the system since we upgraded from v4 to v6.2 a couple of years ago . . . (?)

So, the bug . . .

Looking at the code that renders the bitstream link in the JSPUI (in 
dspace-6.2-src-release/dspace-jspui/src/main/java/org/dspace/app/webui/jsptag/ItemTag.java,
 line 1054), I can see that it first checks the sid – if it is > 0 then the 
link is rendered in the first format, but if it is <= 0 (i.e. “-1”), then the 
link is rendered in the second format:

if ((handle != null) && (b.getSequenceID() > 0)) {
                bsLink = bsLink + "/bitstream/"
                                                + item.getHandle() + "/"
                                                + b.getSequenceID() + "/";
} else {
                bsLink = bsLink + "/retrieve/"
                                                + b.getID() + "/";
}

However, it looks to me like the code that renders the bitstream links for 
inclusion in OAI-PMH output (in 
dspace-6.2-src-release/dspace-oai/src/main/java/org/dspace/xoai/util/ItemUtils.java,
 line 211) doesn’t do this check and simply renders the link in the first 
format regardless:

if (handle != null && baseUrl != null) {
                url = baseUrl + "/bitstream/"
                                                + handle + "/"
                                                + sid + "/"
                                                + URLUtils.encode(bsName);
}

Therefore, my current thought is that if I replace the if statement above with:

if (handle != null && baseUrl != null)
// Updated code to handle both SID and UUID type bitstream URLs - MW: 27/2/20
{
                if (bit.getSequenceID() > 0) {
                                url = baseUrl + "/bitstream/"
                                                + handle + "/"
                                                + sid + "/"
                                                + URLUtils.encode(bsName);
   } else {
                                url = baseURL + "/retrieve/"
                                                + bit.getID() + "/"
                                                + URLUtils.encode(bsName);
   }
}

- then the URLs that are rendered in the OAI-PMH output should be correct for 
both cases.

My next step is to try applying this fix in our DEV system and see if it works 
as I expect, but I’d be interested to know if others agree with my analysis 
(and proposed fix), or if I’ve missed anything, or I’m proposing/doing anything 
daft!

Cheers,

Mike

Michael White
Senior Developer
Business Applications and Integrations
Information Services

4B19, Cottrell
University of Stirling
Stirling
FK9 4LA

Tel:  +44 (0)1786 466877
Email:  [email protected]<mailto:[email protected]>
Web: stir.ac.uk/informationservices<http://www.stir.ac.uk/informationservices>
[cid:[email protected]]<https://www.facebook.com/stirlinglibrary/>[cid:[email protected]]<https://twitter.com/isstirling>[cid:[email protected]]<https://www.instagram.com/isstirling/>[cid:[email protected]]<https://www.youtube.com/user/infoservicesatstir>
[Banner]<https://www.stir.ac.uk/>


From: Bram Luyten <[email protected]>
Sent: 28 February 2020 08:40
To: Michael White <[email protected]>
Cc: DSpace Tech <[email protected]>
Subject: Re: [dspace-tech] Incorrect bitstream URLs in OAI-PMH output?

Hi Michael,

thank you for reporting/sharing this.

Not a solution, but I wanted to share two observations to narrow the problem 
down.

DSpace 6.3 XMLUI - Fresh install

Item link: https://repository.openpolytechnic.ac.nz/handle/11072/128
Bitstream link: 
https://repository.openpolytechnic.ac.nz/bitstream/handle/11072/128/Curry_2002%20-%20Working%20Papers%20-%20res_wp602curryl.pdf?sequence=1&isAllowed=y

Conclusion: Can't reproduce

DSpace 6.3 XMLUI - Upgraded instance & item that already exists pre-upgrade

Item link: https://ramscholar.dspace-express.com/handle/10675.1/96
Bitstream link: 
https://ramscholar.dspace-express.com/bitstream/handle/10675.1/96/sept%2011%2c%202006.pdf?sequence=1&isAllowed=y

Conclusion: Can't reproduce

Both of these installations have OAI enabled but I didn't have the time to look 
at the record there
https://repository.openpolytechnic.ac.nz/oai/request?verb=ListRecords&metadataPrefix=oai_dc
https://ramscholar.dspace-express.com/oai/request?verb=ListRecords&metadataPrefix=oai_dc

Hope this helps!! Would be interested in learning whether this is specific to 
your institution/customization, JSPUI specific, ... as it may affect others as 
well !!

with kindest regards,

Bram

[logo]
Bram Luyten
250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
Gaston Geenslaan 14, 3001 Leuven, Belgium
atmire.com<http://atmire.com/website/?q=services&utm_source=emailfooter&utm_medium=email&utm_campaign=braml>


On Thu, 27 Feb 2020 at 12:52, Michael White 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

We’re using DSpace v6.2, JSPUI.

Whilst troubleshooting an issue with a large number of broken full text links 
harvested from our repository via OAI-PMH by the CORE service, CORE reported to 
us that "the provided full text link in the OAI-PMH dc:identifier field is 
broken."

For example, for this item in our repository:

https://dspace.stir.ac.uk/handle/1893/30142

- the link to the associated bitstream from this repository record is:

https://dspace.stir.ac.uk/retrieve/17570e9c-aa29-4c15-99b2-af5892853652/Revisions_Final_Chronic_wounds.pdf

- however, if harvested via OAI-PMH:

https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=oai_dc

- then the bitstream link in dc.identifier is wrong:

<dc:identifier>http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf</dc:identifier>

- i.e. it contains "-1" where I'd expect to see the bitstream UUID.

And looking at the "raw" XOAI output, it appears to be wrong there too (so not 
an issue with the oai_dc crosswalk?):

https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/30142&metadataPrefix=xoai

<field 
name="url">http://dspace.stir.ac.uk/bitstream/1893/30142/-1/Revisions_Final_Chronic_wounds.pdf</field>

However, a large number of the OAI-PMH bitstream links do work - e.g.:

https://dspace.stir.ac.uk/oai/request?verb=GetRecord&identifier=oai:dspace.stir.ac.uk:1893/58&metadataPrefix=oai_dc

- includes the correct bitstream URL:

<dc:identifier>http://dspace.stir.ac.uk/bitstream/1893/58/1/Thesis.pdf</dc:identifier>

I've tried clearing the cache, and rebuilding the OAI-PMH index, but this issue 
remains. I also searched the Mailing list archives and JIRA, but couldn't find 
anything that seemed to relate to this problem.

I'm not sure, but my current working theory is that links to "older" bitstreams 
do work because they relate to records added to the repository before the 
upgrade that moved DSpace from using numeric IDs to UUIDs - but records added 
since then, that make use of UUIDs, don't work . . . . (but I haven't managed 
to prove this theory yet!).

Has anyone else come across this? Does anyone know of a solution (I'm happy to 
hack code/apply patches if required)?

If you're on this version of DSpace, are all the bitstream URLs harvested via 
OAI-PMH from your repository correct?

If anyone has any fixes, thoughts, observations etc, they would be most welcome 
as I'm currently at a loss as to how to resolve this and, given the importance 
of CORE for supporting the upcoming REF here in the UK, my library colleagues 
are getting a bit jumpy ;-).

Cheers,

Mike

Michael White
Senior Developer
Business Applications and Integrations
Information Services

4B19, Cottrell
University of Stirling
Stirling
FK9 4LA

Tel:  +44 (0)1786 466877
Email:  [email protected]<mailto:[email protected]>
Web: stir.ac.uk/informationservices<http://www.stir.ac.uk/informationservices>
[cid:[email protected]]<https://www.facebook.com/stirlinglibrary/>[cid:[email protected]]<https://twitter.com/isstirling>[cid:[email protected]]<https://www.instagram.com/isstirling/>[cid:[email protected]]<https://www.youtube.com/user/infoservicesatstir>
[Banner]<https://www.stir.ac.uk/>


________________________________
The University achieved an overall 5 stars in the QS World University Rankings 
2018
The University of Stirling is a charity registered in Scotland, number SC 
011159.
--
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/AM6PR03MB5560135CD01E0191E3F1C1C0D4EB0%40AM6PR03MB5560.eurprd03.prod.outlook.com<https://groups.google.com/d/msgid/dspace-tech/AM6PR03MB5560135CD01E0191E3F1C1C0D4EB0%40AM6PR03MB5560.eurprd03.prod.outlook.com?utm_medium=email&utm_source=footer>.
________________________________
The University achieved an overall 5 stars in the QS World University Rankings 
2018
The University of Stirling is a charity registered in Scotland, number SC 
011159.

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/AM6PR03MB5560E950C85FB46297C06539D4E80%40AM6PR03MB5560.eurprd03.prod.outlook.com.

Reply via email to