Re: [CODE4LIB] Cover pages and Google

Bernadette Houghton Tue, 25 Nov 2014 21:23:39 -0800

Thanks, Daron. We don't actually run OCR over the cover page PDF. We just save 
it in Word as an Adobe PDF, then append to the existing document, which is 
already in PDF format.

We also sometimes create cover sheets via a batch process using MS Word; again, 
just saving in default Adobe PDF format.

Bern

-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Daron 
Dierkes
Sent: Wednesday, 26 November 2014 2:29 PM
To: [email protected]
Subject: Re: [CODE4LIB] Cover pages and Google

Perhaps it depends on how you are generating PDFs.  If it is straight acrobat, 
then it should be as easy as making a PDF of all but the cover, running OCR, 
then adding the cover in as another page.  As long as you do not generate OCR 
again, the added pages should stay image only.  I haven't tried it, but I'm 
pretty sure that's possible.

If it is a question specific to your repository architecture then it might be 
harder.

On Tuesday, November 25, 2014, Dan Scott <[email protected]> wrote:

> Could you provide some examples of the resources that you're excluding
> and searches that return those results (maybe with screen shots in
> case Google serves up different results to different users)? I'm
> having a bit of trouble understanding your problem description.
>
> I'll admit that my schema.org hammer is itchy, but I don't want to
> jump to conclusions as the problem might not even be a construction
> issue, let alone a nail :) On 24 Nov 2014 22:57, "Bernadette Houghton"
> < [email protected] <javascript:;>> wrote:
>
> > We've discovered that cover pages we add to items in our research
> > repository have the unwelcome side effect of causing Google to
> > display
> the
> > cover page citation in search results, rather than the intro or preface.
> > The problem doesn't occur in Google Scholar, just the main Google
> > search engine.
> >
> > One way to avoid this problem is to have the cover page formatted as
> > an image PDF rather than a text-readable PDF. Can anyone recommend a
> software
> > that will convert a text-readable PDF to an image PDF??
> >
> > TIA
> >
> > Bernadette Houghton
> > Digitisation and Preservation Librarian Library
> > [Title: Deakin University logo]
> > Deakin University
> > Locked Bag 20000, Geelong, VIC 3220
> > +61 3 52278230
> > [email protected] <javascript:;><mailto:
> [email protected] <javascript:;>
> > >
> > www.deakin.edu.au<http://www.deakin.edu.au/>
> > Deakin University CRICOS Provider Code 00113B
> >
> >
> > Important Notice: The contents of this email are intended solely for
> > the named addressee and are confidential; any unauthorised use,
> > reproduction
> or
> > storage of the contents is expressly prohibited. If you have
> > received
> this
> > email in error, please delete it and any attachments immediately and
> advise
> > the sender by return email or telephone.
> >
> > Deakin University does not warrant that this email and any
> > attachments
> are
> > error or virus free.
> >
>

Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.

Re: [CODE4LIB] Cover pages and Google

Reply via email to