Hi you have a multi threaded spider crawling through yahoo ? I think you'll be able to get by with 1 thread per domain. And control the thread at say 1 second per hit or more.
I'm actually curious to know what there threshold is before they block you. If anyone finds out I would be glad to know. Stev --- "Rasmus T. Mohr" <[EMAIL PROTECTED]> wrote: > > I've had similar problems with Yahoo. The only > solution was to change the > IP-address of the spider host. > > > -----Original Message----- > > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED]]On > > Behalf Of Nick Arnett > > Sent: Saturday, July 27, 2002 10:54 PM > > To: Internet robots discussion > > Subject: [Robots] Does Yahoo have new robot > defenses? > > > > > > > > It looks to me as though Yahoo has some sort of > robot defense > > operating. I > > was just testing a multi-threaded robot that I use > to analyze > > discussions, > > including Yahoo's stock market boards. On the > first run, it > > seemed to do > > fine, but when I tried to run it again a few > minutes later, it didn't > > retrieve anything... so I tried going to the > message boards > > using IE on the > > same machine. Every page is returning a 403 > Forbidden error > > now -- even > > when I try to see robots.txt. As far as I know, > Yahoo has > > never even had a > > robots.txt file. > > > > I'm guessing that the speed of my robot triggered > a block > > against this IP > > address. Another machine, in the same subnet, can > access the > > pages just > > fine. > > > > I've been working on the underlying database for > the last few > > weeks, so I > > haven't run the spider lately. Thus, I'm not sure > when this > > behavior might > > hvae started. > > > > My robot is quite fast and my connection yields > throughput of about 1 > > mbit/sec, so it certainly hit their server fairly > hard. But hey, it's > > Yahoo. If they can't handle getting hit this hard > on a > > mid-day Saturday, > > it's hard to imagine who can. > > > > No lectures about well-behaved robots, please. I > know, I > > know. The next > > step for that robot will be to have each thread > hit > > completely different > > domains. Perhaps each one will rotate through a > few domains. > > > > Anybody know what Yahoo might be doing, or what > its policy is > > about robots? > > I haven't been able to find anything that > addresses the issue > > directly. I > > don't see anything under its TOS that would > clearly apply. > > If they want to > > have a limit on robots, I sure would appreciate it > if they > > would say what it > > is... > > > > It's been about 30 minutes now and I'm still > blocked, it seems. > > > > Just checked from another machine -- they still > have no > > robots.txt at all. > > > > Nick > > > > -- > > [EMAIL PROTECTED] > > (408) 904-7198 > > > > > > > > __________________________________________________ Do You Yahoo!? HotJobs - Search Thousands of New Jobs http://www.hotjobs.com _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots