Hi,

On 09/04/13 01:50, Mark H. Wood wrote:
> Well, one thing I'd do is to complain to the people who run the GSA.
> IMHO it shouldn't be hitting any site that hard.  If there isn't a way
> to throttle it back...there should be.

There is -- using If-Modified-Since and a 200 vs 304 response code. The
problem is that DSpace (XMLUI) could be considered broken when it comes
to that: it behaves differently for requests it considers to come from
bots vs other requests. For bot requests, it uses the item's last
modified date; for other requests, it uses the date that the item page
was last cached IIRC. This is why adding gsa-crawler to the list of
known bots in sitemap.xmap helps. This behaviour is specific to XMLUI,
so the folks in this thread who use JSPUI need to check whether
something similar exists in their case.

> DSpace can generate sitemaps.  Can the bot in question be taught to
> calm down and just use these?

Telling gsa-crawler about the sitemap (which, just to be absolutely
clear, is something completely different from the XMLUI sitemap.xmap
despite the similarity in names) did help. Since we did both, I'm not
sure which measure was ultimately most useful in taming gsa-crawler.

cheers,
Andrea

-- 
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to