Is there something simple I can place in the jsp that will prohibit
the crawlers from
using my server resources?
TIA,
Jeff
Jeffrey Trimble
Systems Librarian
Maag Library
Youngstown State University
330-941-2483 (Office)
jtrim...@cc.ysu.edu
http://www.maag.ysu.edu
http://digital.maag.ysu.edu
Jeff:
We had an issue with our local google instance crawling our DSpace
installation and causing huge issues. I re-wrote the robots.txt to
disallow anything besides the item pages themselves - no browsing
pages or search pages and whatnot. Here is a copy of ours:
User-agent: *
Disallow:
As of DSpace 1.5, sitemaps are supported which allow search engines to
selectively crawl only new items, while massively reducing the server
load:
http://www.dspace.org/1_5_1Documentation/ch03.html#N10B44
Unfortunately, it seems that relatively few DSpace instances actually
use this feature.
I
Jeff:
What I am using is a robots.txt file that I put in the dspace webapps
directory in tomcat. I think it's working (at least we haven't
crashed lately). If you're interested in seeing my robots.txt file,
I can send it to you.
At 01:09 PM 1/14/2009, Jeffrey Trimble wrote:
Is there
On Wed, 14 Jan 2009, Shane Beers wrote:
We had an issue with our local google instance crawling our DSpace
installation and causing huge issues. I re-wrote the robots.txt to disallow
anything besides the item pages themselves - no browsing pages or search
pages
and whatnot. Here is a
Beers
Cc: dspace-tech Tech; Jeffrey Trimble
Subject: Re: [Dspace-tech] Google bots and web crawlers
As of DSpace 1.5, sitemaps are supported which allow search engines to
selectively crawl only new items, while massively reducing the server
load:
http://www.dspace.org/1_5_1Documentation/ch03.html
6 matches
Mail list logo