Re: [Dspace-general] PDF Cover Pages & Google Scholar - Search Engine inclusion implications

Peter Dietz Thu, 18 Jun 2015 15:22:19 -0700

I didn't attend OR, but I'm familiar with the pitfalls of cover pages. The
problem with indexing is usually implementation, so building a proper cover
page was a requirement of this DSpace cover page implementation.

So, originally built for kb.OSU.edu, I don't see an issue with their cover
page content in scholar. Do you? They've had this feature for 3+ years.
https://scholar.google.com/scholar?hl=en&q=Variation+in+Syndesmon+Thalictroides+&btnG=&as_sdt=1%2C36&as_sdtp=

I imagine that the talk concerning pitfalls was likely a generic warning,
and not "Google scholar has analyzed and has indexing problems with DSpace
5 cover pages". The classic cover page pitfall is putting huge
institutional branding on the cover page, so then each page says
"Downloaded from Mars University". In the DSpace 5 implementation, the item
title is the largest font.

This feature is opt-in, and it solves important archiving use cases, and
preserves the original document in the archive. You can run it in curation
task mode instead of dynamic, if needed. It also prevents institutions from
uploading manually created cover pages, (thus the only archived version has
the cover page). Personally, I see DSpace adding features like these as
being great for having a comprehensive archival system.
On Jun 18, 2015 4:58 PM, "Kim Shepherd" <kim.sheph...@gmail.com> wrote:

>
> On 19 June 2015 at 08:47, Monika C. Mevenkamp <moni...@princeton.edu>
> wrote:
>
>> The reason Anurag gave for disliking cover pages was, that they can make
>> it difficult to discern things like - author - title, journal, ….  It seems
>> to me that if the generated cover page includes those metadata fields along
>> with custom text explaining the origin of the pdf, google scholar should
>> not have any difficulty getting to the metadata they are looking for.
>> Another ‘bad case’ Anurag mentioned was documents that have multiple cover
>> pages. I expect that the current implementation does avoid adding cover
>> pages to already ‘covered’ pdfs.
>>
>> Monika
>>
>
> Yes, one of my immediate thoughts was: is that a fundamental problem with
> cover pages, or is it possible to "Do It Right"?
> If we inject good metadata into the derived PDFs and ensure that titles,
> author, date were all high up on page one, could we actually be helping to
> an extent?
>
> I hadn't thought of the 'cloaking issue' though. If low entropy between
> page 1 of repository PDFs would cause Google to penalise us, then that's
> hard to get around without finding a way to not serve them the cover pages.
>
> Cheers
>
> Kim
>
> M: k...@shepherd.nz
> T: @kimshepherd
> P: +6421883635
>
> 0CCB D957 0C35 F5C1 497E CDCF FC4B ABA3 2A1A FAEC
> https://keybase.io/kshepherd
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Dspace-general mailing list
> Dspace-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Dspace-general mailing list
Dspace-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-general

Re: [Dspace-general] PDF Cover Pages & Google Scholar - Search Engine inclusion implications

Reply via email to