Thanks for all the responses. I do appreciate the value of having Google index our sites, but my concern is that it seems to be doing it repeatedly. This particular repository has only 551 items; to generate the traffic for which GoogleBot seems responsible, it would have to be repeatedly downloading every item rather than just grabbing the new items. Is this normal?
On 9 September 2011 16:40, Peter Dietz <[email protected]> wrote: > > GoogleBot can discover content through your sitemap/htmlmap, but there is no > metadata in the sitemap, just a series of links to item/collection handles. > GoogleBot will then have to crawl the item pages anyways to get the data. > According to what I've read, and been told on the phone, GoogleBot is going > to have best success crawling your site if it can incrementally crawl your > site according to date. > For more in depth look, here's a copy of a presentation from Robert Tansley > (Google) "De-misting DSpace and Search Engines". > https://atmire.com/labs17/handle/123456789/11796 Page six of that presentation states: 'Crawling HTML and/or Sitemaps '... '- Few of you are using sitemaps '... '- Be absolutely sure your "browse by date" pages aren't blocked in robots.txt' Because I am using sitemaps, I have 'Disallow: /browse' in my robots.txt. That would effectively preclude browse by date. But is that OK because I'm using sitemaps? > Lastly, if your concerned about site load, you can go into webmaster tools > (Google), and tell GoogleBot to crawl your site less aggressively. I've considered this, but I think doing so won't prevent GoogleBot from repeatedly downloading the entire site: it'll only slow the process. Maybe what I've got to do is allow GoogleBot access periodically? Or I could go back to accepting it and leave it alone. Ultimately, though, I can rest assured that this is expected behaviour? Sean -- Sean Carte esAL Library Systems Manager +27 72 898 8775 +27 31 373 2490 fax: 0866741254 http://esal.dut.ac.za/ ------------------------------------------------------------------------------ Doing More with Less: The Next Generation Virtual Desktop What are the key obstacles that have prevented many mid-market businesses from deploying virtual desktops? How do next-generation virtual desktops provide companies an easier-to-deploy, easier-to-manage and more affordable virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

