On Thu, 11 Feb 2010, Michael White wrote:

>:session_id=9E40BFD899A2AA5C23E81404AF5B97A5:internal_error:-- URL Was: 
>https://dspace.stir.ac.uk/dspace/browse-title?bottom=1893/214
[snip]
> --------------------------------
> User-agent: *
>
> Disallow: /browse-author
> Disallow: /items-by-author
> Disallow: /browse-date
> Disallow: /browse-subject
> --------------------------------

You should add "/dspace" to the start of those disallowed patterns, 
because your DSpace URLs start with "/dspace" after the hostname.

The "standard" (or rather, consensus) has this to say about disallow 
fields in robot.txt:
"The value of this field specifies a partial URL that is not to be 
visited. This can be a full path, or a partial path; any URL that starts 
with this value will not be retrieved."

Note the "starts with".

See also: http://www.robotstxt.org/


Best regards,

--
Tom De Mulder <td...@cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 11/02/2010 : The Moon is Waning Crescent (19% of Full)

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to