I just figured out that the web crawler does not follow the rules defined by the robots meta tag. I created a document with the following tag:
<meta name="robots" content="noindex, nofollow">
This document has also a link to another document in order to test the "nofollow" rule, but both documents were fetched and indexed by Solr.
Should I open a Jira issue about this? I hope it's easy to rewrite the crawler in order to add this functionality since this is a blocker for us.
Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050