Hi, Ene:

We are using the JSPUI at Cornell University, and we are using sitemaps, so you 
should be able to set them up to help in your "bot" problem.

I run <path to dspace>/bin/dspace generate-sitemaps every Saturday.  The 
sitemaps end up in <path to dspace>/sitemaps and as I recall I had to register 
with Google.  
I think it's all in the Install guide.

George Kozak
Digital Library Specialist
Cornell University Library Information Technologies (CUL-IT)
218 Olin Library
Cornell University
Ithaca, NY 14853
607-255-8924

-----Original Message-----
From: Ene Rammer Nielsen [mailto:ram...@ruc.dk] 
Sent: Monday, April 08, 2013 9:13 AM
To: Andrea Schweer; dspace-tech@lists.sourceforge.net; Sims, Richard B; Hilton 
Gibson
Subject: Re: [Dspace-tech] Bots and cpu

Hi,
Thanks for the responses.

We discovered our problems by looking through apache logs. Unfortunately we are 
not using the xmlui, but jsp ui, so I guess we can't use sitemap.xmap. We are 
looking into setting up some restrictions in our robots.txt to see if that will 
help.

We also talked about restricting bots bandwidth by using mod-bw. Does anybody 
have any experience with that?
Regards,
Ene Rammer Nielsen, Roskilde University Library.

-----Oprindelig meddelelse-----
Fra: Andrea Schweer [mailto:schw...@waikato.ac.nz]
Sendt: 8. april 2013 07:35
Til: dspace-tech@lists.sourceforge.net
Emne: Re: [Dspace-tech] Bots and cpu

Hi,

On 07/04/13 02:14, Sims, Richard B wrote:
> our site's Google Search Appliance was intensely indexing the content served 
> by this system. When that indexing completed, the load average went back to 
> nil. Looking back in our system monitoring graphs, I saw that this load spike 
> occurred every Tuesday morning - when the GSA was doing a full indexing run. 

When we experienced some problems with GSA in one of "my" repositories, we 
managed to improve the situation quite significantly by adding gsa-crawler to 
the list of known bots in the main XMLUI sitemap.xmap.
See here for some background:
http://www.mail-archive.com/dspace-tech@lists.sourceforge.net/msg19537.html

We're also disallowing Discovery and the browse indexes for gsa-crawler; it 
gets all relevant pages via the sitemap anyway.

Sorry about the short e-mail, I'm about to head home. I can give some more 
details about the sitemap.xmap changes if anyone is interested.

cheers,
Andrea

--
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems The University of Waikato, 
Hamilton, New Zealand


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire the most 
talented Cisco Certified professionals. Visit the Employer Resources Portal 
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire the most 
talented Cisco Certified professionals. Visit the Employer Resources Portal 
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to