We've found recently that Google Scholar is having some issues with their 
crawler having a lot of trouble matching the PDF URLs with the correct 
repository record/landing page with the associated metadata because of 
redirects in place for the PDF URLs.

For example, https://qspace.library.queensu.ca/handle/1974/28134

lists in the metatags

<meta 
content="https://qspace.library.queensu.ca/bitstream/1974/28134/10/Public-Water-Covid-19.pdf";
 
name="citation_pdf_url">

 

But for the crawler, this PDF URL redirects to another URL:

Fetched Header
Permanent redirect (301) to 
https://qspace.library.queensu.ca/bitstream/handle/1974/28134/Public-Water-Covid-19.pdf;jsessionid=3681375205557337FE2EA1F5058CED47?sequence=10

This is being flagged as suspicious behavior (cloaking) by the indexing 
system, so this redirect is not followed. 

I looked through the DSpace documentation and found this page
https://wiki.lyrasis.org/display/DSDOC5x/Search+Engine+Optimization#SearchEngineOptimization-AvoidredirectingfiledownloadstoItemlandingpages

Which states:

Make sure that you never redirect "direct file downloads" (i.e. users who 
directly jump to downloading a file, often from a search engine) to the 
associated Item's splash/landing page.  In the past, some DSpace sites have 
added these custom URL redirects in order to facilitate capturing 
statistics via Google Analytics or similar.

We've never put those in, and are using stock code for that segment. Is 
there a configuration variable somewhere to disable this auto URL redirect 
that we've missed somewhere?

Alex


-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/02f60d0d-8aad-4e10-9565-f2fa84950c75n%40googlegroups.com.

Reply via email to