Rob,

here is an example of dspace.mit.edu/robots.txt. Which does follow your guidelines.

Crawl-delay: 1

User-agent: *
Disallow: /*browse-author
Disallow: /*items-by-author
Disallow: /*browse-date
Disallow: /*browse-subject
Disallow: /*browse-title
Disallow: /*type=author
Disallow: /*type=dateissued
Disallow: /*type=subject

User-Agent: Googlebot
Disallow: /*browse-author$
Disallow: /*items-by-author$
Disallow: /*browse-date$
Disallow: /*browse-subject$
Disallow: /*browse-title$
Disallow: /*type=author
Disallow: /*type=dateissued
Disallow: /*type=subject

Note: When I was at MIT, we choose not to have google crawl and index all browse index pages because we really only want Item level metadata exposed and its an exhaustive waste of bandwidth to be trolling those. The above configuration is set to inform googlebot it should only traverse the browse index "http://dspace.mit.edu/browse? type=title" to access all the Items in our DSpace instance.

Note this robots.txt is tailored for the XMLUI where /browse? type=title only exists.

Cheers,
Mark


On Feb 5, 2009, at 1:21 PM, Robert Tansley wrote:

To all users of DSpace 1.5 and DSpace 1.5.1:

These versions of DSpace ship with a bad robots.txt file that prevents search engines such as Google Scholar or Yahoo from indexing any content on a DSpace site. To check if this applies to you:

- Visit your site's robots.txt -- http://your_dspace_hostname.edu/robots.txt
- If you see the following line you have a bad robots.txt:

Disallow: /browse

It is important that you REMOVE this line from your robots.txt to ensure that your DSpace instance is correctly indexed by search engines. More info on ensuring your DSpace site is correctly indexed here:

http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed

Robert Tansley / Google
_______________________________________________
Dspace-general mailing list
dspace-gene...@mit.edu
http://mailman.mit.edu/mailman/listinfo/dspace-general

~~~~~~~~~~~~~
Mark R. Diggory
http://purl.org/net/mdiggory/homepage



------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to