Bernadette,

This issue was the topic of a paper from the University of Bath last year - see 
http://opus.bath.ac.uk/34033/.

One outcome from this was a conversation with EPrints to make sure that 
metadata was not affected by a coversheet and therefore shouldn't affect Google 
crawling.  So it seems to be something that can be addressed according to how 
the underlying system is treating the coversheet (as you have noted by making 
it an image instead).

Regards,

Chris
On 27 Nov 2014, at 04:00, CODE4LIB automatic digest system wrote:


Date:    Wed, 26 Nov 2014 05:06:13 +0000
From:    Bernadette Houghton 
<bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.edu.au>>
Subject: Re: Cover pages and Google

Dan, here's an example search result returned by Google Scholar:

https://dl.dropboxusercontent.com/u/29347274/google.jpg

The "This is the authors' final..." text comes from the PDF. Ideally, the text 
would be the article title.

Regards
Bern

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dan 
Scott
Sent: Wednesday, 26 November 2014 11:54 AM
To: CODE4LIB@LISTSERV.ND.EDU<mailto:CODE4LIB@LISTSERV.ND.EDU>
Subject: Re: [CODE4LIB] Cover pages and Google

Could you provide some examples of the resources that you're excluding and 
searches that return those results (maybe with screen shots in case Google 
serves up different results to different users)? I'm having a bit of trouble 
understanding your problem description.

I'll admit that my schema.org<http://schema.org/> hammer is itchy, but I don't 
want to jump to conclusions as the problem might not even be a construction 
issue, let alone a nail :) On 24 Nov 2014 22:57, "Bernadette Houghton" < 
bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.edu.au>> 
wrote:

We've discovered that cover pages we add to items in our research
repository have the unwelcome side effect of causing Google to display
the cover page citation in search results, rather than the intro or preface.
The problem doesn't occur in Google Scholar, just the main Google
search engine.

One way to avoid this problem is to have the cover page formatted as
an image PDF rather than a text-readable PDF. Can anyone recommend a
software that will convert a text-readable PDF to an image PDF??

TIA

Bernadette Houghton
Digitisation and Preservation Librarian Library
[Title: Deakin University logo]
Deakin University
Locked Bag 20000, Geelong, VIC 3220
+61 3 52278230
bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.ed
u.au<http://u.au/>

www.deakin.edu.au<http://www.deakin.edu.au/>
Deakin University CRICOS Provider Code 00113B


Important Notice: The contents of this email are intended solely for
the named addressee and are confidential; any unauthorised use,
reproduction or storage of the contents is expressly prohibited. If
you have received this email in error, please delete it and any
attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments
are error or virus free.


Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.

------------------------------

Date:    Wed, 26 Nov 2014 05:22:34 +0000
From:    Bernadette Houghton 
<bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.edu.au>>
Subject: Re: Cover pages and Google

Thanks, Daron. We don't actually run OCR over the cover page PDF. We just save 
it in Word as an Adobe PDF, then append to the existing document, which is 
already in PDF format.

We also sometimes create cover sheets via a batch process using MS Word; again, 
just saving in default Adobe PDF format.

Bern

-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Daron 
Dierkes
Sent: Wednesday, 26 November 2014 2:29 PM
To: CODE4LIB@LISTSERV.ND.EDU<mailto:CODE4LIB@LISTSERV.ND.EDU>
Subject: Re: [CODE4LIB] Cover pages and Google

Perhaps it depends on how you are generating PDFs.  If it is straight acrobat, 
then it should be as easy as making a PDF of all but the cover, running OCR, 
then adding the cover in as another page.  As long as you do not generate OCR 
again, the added pages should stay image only.  I haven't tried it, but I'm 
pretty sure that's possible.

If it is a question specific to your repository architecture then it might be 
harder.




On Tuesday, November 25, 2014, Dan Scott 
<deni...@gmail.com<mailto:deni...@gmail.com>> wrote:

Could you provide some examples of the resources that you're excluding
and searches that return those results (maybe with screen shots in
case Google serves up different results to different users)? I'm
having a bit of trouble understanding your problem description.

I'll admit that my schema.org<http://schema.org/> hammer is itchy, but I don't 
want to
jump to conclusions as the problem might not even be a construction
issue, let alone a nail :) On 24 Nov 2014 22:57, "Bernadette Houghton"
< bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.edu.au> 
<javascript:;>> wrote:

We've discovered that cover pages we add to items in our research
repository have the unwelcome side effect of causing Google to
display
the
cover page citation in search results, rather than the intro or preface.
The problem doesn't occur in Google Scholar, just the main Google
search engine.

One way to avoid this problem is to have the cover page formatted as
an image PDF rather than a text-readable PDF. Can anyone recommend a
software
that will convert a text-readable PDF to an image PDF??

TIA

Bernadette Houghton
Digitisation and Preservation Librarian Library
[Title: Deakin University logo]
Deakin University
Locked Bag 20000, Geelong, VIC 3220
+61 3 52278230
bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.edu.au> 
<javascript:;><mailto:
bernadette.hough...@deakin.edu.au<mailto:bernadette.hough...@deakin.edu.au> 
<javascript:;>

www.deakin.edu.au<http://www.deakin.edu.au/>
Deakin University CRICOS Provider Code 00113B


Important Notice: The contents of this email are intended solely for
the named addressee and are confidential; any unauthorised use,
reproduction
or
storage of the contents is expressly prohibited. If you have
received
this
email in error, please delete it and any attachments immediately and
advise
the sender by return email or telephone.

Deakin University does not warrant that this email and any
attachments
are
error or virus free.



Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.

**************************************************
To view the terms under which this email is 
distributed, please go to 
http://www2.hull.ac.uk/legal/disclaimer.aspx
**************************************************

Reply via email to