Hi,
as far I can see nutch's html parser does only support the meta tag noindex (<meta name="ROBOTS" content="NOINDEX,NOFOLLOW"> ) but there is an inoffiziel html <noindex> tag.
http://www.webmasterworld.com/forum10003/2703.htm

May be this would be another thing to make nutch more polite.
Also please remember my patch to support crawl-delay properties in robots.txt. That would be also something important to make nutch more polite and may be a better way than removing the nutch crawler identification.

Thoughts?
Stefan

Reply via email to