do not index

Stefan Groschupf Thu, 22 Jun 2006 06:26:48 -0700

Hi,

as far I can see nutch's html parser does only support the meta tagnoindex (<meta name="ROBOTS" content="NOINDEX,NOFOLLOW"> ) but thereis an inoffiziel html <noindex> tag.

http://www.webmasterworld.com/forum10003/2703.htm


May be this would be another thing to make nutch more polite.

Also please remember my patch to support crawl-delay properties inrobots.txt. That would be also something important to make nutch morepolite and may be a better way than removing the nutch crawleridentification.


Thoughts?

Stefan

do not index

Reply via email to