Thanks for all the responses.

I do appreciate the value of having Google index our sites, but my
concern is that it seems to be doing it repeatedly. This particular
repository has only 551 items; to generate the traffic for which
GoogleBot seems responsible, it would have to be repeatedly
downloading every item rather than just grabbing the new items. Is
this normal?

On 9 September 2011 16:40, Peter Dietz <[email protected]> wrote:
>
> GoogleBot can discover content through your sitemap/htmlmap, but there is no
> metadata in the sitemap, just a series of links to item/collection handles.
> GoogleBot will then have to crawl the item pages anyways to get the data.
> According to what I've read, and been told on the phone, GoogleBot is going
> to have best success crawling your site if it can incrementally crawl your
> site according to date.
> For more in depth look, here's a copy of a presentation from Robert Tansley
> (Google)  "De-misting DSpace and Search Engines".
> https://atmire.com/labs17/handle/123456789/11796

Page six of that presentation states:

'Crawling HTML and/or Sitemaps
'...
'- Few of you are using sitemaps
'...
'- Be absolutely sure your "browse by date" pages aren't blocked in robots.txt'

Because I am using sitemaps, I have 'Disallow: /browse' in my
robots.txt. That would effectively preclude browse by date. But is
that OK because I'm using sitemaps?

> Lastly, if your concerned about site load, you can go into webmaster tools
> (Google), and tell GoogleBot to crawl your site less aggressively.

I've considered this, but I think doing so won't prevent GoogleBot
from repeatedly downloading the entire site: it'll only slow the
process.

Maybe what I've got to do is allow GoogleBot access periodically? Or I
could go back to accepting it and leave it alone.

Ultimately, though, I can rest assured that this is expected behaviour?

Sean
-- 
Sean Carte
esAL Library Systems Manager
+27 72 898 8775
+27 31 373 2490
fax: 0866741254
http://esal.dut.ac.za/

------------------------------------------------------------------------------
Doing More with Less: The Next Generation Virtual Desktop 
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops?   How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to