[Dspace-devel] [DuraSpace JIRA] (DS-1138) robots.txt

Created Mon, 05 Mar 2012 02:15:14 -0800

robots.txt
----------

                 Key: DS-1138
                 URL: https://jira.duraspace.org/browse/DS-1138
             Project: DSpace
          Issue Type: Bug
            Reporter: Ivan Masár



By default, robots.txt in XMLUI allows indexing all content. This leads to 
indexing all browse, search and discovery pages. Search engines then give 
mostly results pointing to these lists of results instead of the proper items. 
I suggest to disallow the following pages by default:

User-agent: *
Disallow: /discover
Disallow: /search-filter

Note, that current robots.txt contains this message:
# Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used
# and you have verified that your site is being indexed correctly.
# Disallow: /browse

Since all items should be accessible via the browse pages in the 
community/collection structure, /browse pages should be allowed by default to 
enable spiders to explore the whole repository. But /discover and 
/search-filter are surely redundant and only clutter the search results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

[Dspace-devel] [DuraSpace JIRA] (DS-1138) robots.txt

Reply via email to