Doug Cutting wrote:
http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.html
well, I think incrediBILL has an argument, that people might really
start excluding bots from their servers if it's
becoming too much. What might help is that incrediBILL would offer an
index of the site, which should be smaller
than the site itself. I am not sure if there exists a "standard" for
something like this. Basically the bot would ask the
server if an index exists and where it is located and what the date it
is from and then the bot decides to download the index
or otherwise starts crawling the site.
Michi
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
[EMAIL PROTECTED] [EMAIL PROTECTED]
+41 44 272 91 61