Hi,
as far I can see nutch's html parser does only support the meta tag
noindex (<meta name="ROBOTS" content="NOINDEX,NOFOLLOW"> ) but there
is an inoffiziel html <noindex> tag.
http://www.webmasterworld.com/forum10003/2703.htm
May be this would be another thing to make nutch more polite.
Also please remember my patch to support crawl-delay properties in
robots.txt. That would be also something important to make nutch more
polite and may be a better way than removing the nutch crawler
identification.
Thoughts?
Stefan