Hi Andrea:

The case you cite is not as obvious to me:  how can we assume that the single 
PDF is the primary artifact (i.e. the one that the rest of the GS tags 
describe)?
We have cases where (in an Item) the article is in Word, or LaTeX, and a 
supplementary file is a PDF. In those cases the rule you propose would ask GS 
to index the wrong bitstream.
Because of cases like these, we deliberately enshrined the most conservative 
rule possible (if there is only one bitstream *and* it's a PDF) - since scholar 
asked us to value accuracy over completeness.

But it is absolutely right that the rule can be too restrictive in many ways. 
We kicked around (but didn't have time to implement for the 1st release) the 
notion of a site-specific, user-configurable 'map' function or functions, that 
would yield 0 or 1 bitstreams per item. The idea is that if there *is* a 
consistent 'pattern'  (like the one you mention), the page could dynamically 
determine the value of the citation_pdf_url by calling the function. Design 
questions include:

* should there be a site-wide mapping rule, or one per collection (per format 
type, etc)?
* probably should be be a default (maybe just the current hard-coded one) - so 
that we don't force additional configuration
* how should the rule be expressed?
* how to limit runtime penalties

etc.

I can probably dig up some notes on this if there is interest in that approach.

My 2 cents,

Richard
On Jan 13, 2013, at 11:38 PM, Andrea Schweer wrote:

Hi all,

I just discovered that DSpace (XMLUI, 1.8.2 but 3.0 has the same behaviour) 
generates the citation_pdf_url header for Google Scholar on an item page if and 
only if

  *    the item has exactly one bitstream in the ORIGINAL bundle (or the first 
such bundle, to be precise); and
  *    this bitstream is of type application/pdf

Code in master here: 
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/app/util/GoogleMetadata.java#L1007

I found old discussion around this in Jira here: 
https://jira.duraspace.org/browse/DS-396?focusedCommentId=17461&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17461
 that explains what I assume is still the reasoning behind the current 
explanation:

How does one choose, for instance, a.) which PDF in an item with multiple PDF 
bitstreams, b.) what is specified for a URL when there is no PDF for an item, 
c.) whether or not to specify a PDF if the only PDF available is not the main 
representative bitstream of the item. Google Scholar has said they are not 
interested in having citation tags for an item if this field is not provided 
for.

I find this a bit counter-intuitive especially in the case of items with one 
PDF file plus one more more files in a different format -- surely there it 
should be fine to use the single PDF file in the citation_pdf_url? Are there 
any other opinions around this?

cheers,
Andrea


--
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET<http://ASP.NET>, C# 2012, HTML5, 
CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to