I've seen this as well… One early Tuesday morning I was making a DSpace system change and, right afterward the system load average spiked, making for that bad feeling that something I had changed was no good. Analysis proved otherwise, however. I found that the jsvc process was eating all available CPU power. The Tomcat logs are pretty useless when pursuing problems. Going to the Apache access_log showed that our site's Google Search Appliance was intensely indexing the content served by this system. When that indexing completed, the load average went back to nil. Looking back in our system monitoring graphs, I saw that this load spike occurred every Tuesday morning - when the GSA was doing a full indexing run. Servers implemented in just Apache had no issues. (Apache is implemented in a compiled programming language.)
Java is notorious for its overhead, security issues, and JVM programming errors, and this is one more manifestation of the issues one can suffer when implementing applications in Java. There isn't a lot you can do about this, as long as DSpace is implemented in Java and you want your site's content to be findable via search sites such as Google. You could try blocking non-mainstream indexing bots via firewall settings, but that quickly gets messy. Richard Sims Sr. Systems Engineer, Information Services & Technology Boston University http://people.bu.edu/rbs On Apr 6, 2013, at 6:26 AM, Hilton Gibson <[email protected]> wrote: > "We are currently experiencing that bot’s (eg. googlebot, bingbot) are using > up all the cpu on our dspace installation" Can you explain how you determined > this? > > > On 5 April 2013 20:42, Ene Rammer Nielsen <[email protected]> wrote: > Hi, > We are currently experiencing that bot’s (eg. googlebot, bingbot) are using > up all the cpu on our dspace installation. We were wondering whether anybody > have any good advice? We have discussed different solutions, but we also > noticed that the problem could be due to missing database indexs, so we would > like to hear if anybody have any good ideas. > > Hope someone can help. ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

