In my company we changed the default and many other probably did the same. However, we must not ignore the behavior of the irresponsible users of Nutch. And for that reason the use of the default must be blocked in code.
Just my 2 cents. -----Original Message----- From: Michael Wechner [mailto:[EMAIL PROTECTED] Sent: Thursday, June 15, 2006 9:30 AM To: nutch-dev@lucene.apache.org Subject: Re: IncrediBILL's Random Rants: How Much Nutch is TOO MUCH Nutch? Doug Cutting wrote: > http://incredibill.blogspot.com/2006/06/how-much-nutch-is-too-much-nutch.htm l > > well, I think incrediBILL has an argument, that people might really start excluding bots from their servers if it's becoming too much. What might help is that incrediBILL would offer an index of the site, which should be smaller than the site itself. I am not sure if there exists a "standard" for something like this. Basically the bot would ask the server if an index exists and where it is located and what the date it is from and then the bot decides to download the index or otherwise starts crawling the site. Michi -- Michael Wechner Wyona - Open Source Content Management - Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED] [EMAIL PROTECTED] +41 44 272 91 61