It's probably more an issue with DNS resolution than robots.txt. Even if you
respect the robots.txt instructions you can still have N host or even domain
names pointing to a single server. This can be avoided in Nutch by setting
'partition.url.mode' and 'fetcher.queue.mode' to 'byIP'.


On 16 August 2010 08:06, CatOs Mandros <[email protected]> wrote:

> Rather amusing :)
>
> Something similar was what made Grub gain a bit of bad reputation...
> thank god we have the robots.txt file.
>
> On Sat, Aug 14, 2010 at 7:48 PM, Mattmann, Chris A (388J)
> <[email protected]> wrote:
> > LOL...
> >
> >
> > On 8/14/10 8:57 AM, "Ken Krugler" <[email protected]> wrote:
> >
> > Dear @80legs stop crushing metafilter.com from 2226 distinct IP
> addresses.
> > Your bots are DDOSing the site with thousands of requests. Stop.
> > <http://twitter.com/mathowie/status/20326707535>
> >
> > -- Ken
> >
> >
> > --------------------------------------------
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > e l a s t i c   w e b   m i n i n g
> >
> >
> >
> >
> >
> >
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: [email protected]
> > WWW:   http://sunset.usc.edu/~mattmann/<http://sunset.usc.edu/%7Emattmann/>
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
> >
>



-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

Reply via email to