Hi 
you have a multi threaded spider crawling through
yahoo ? I think you'll be able to get by with 1 thread
per domain. And control the thread at say 1 second per
hit or more. 

I'm actually curious to know what there threshold is
before they block you. If anyone finds out I would be
glad to know. 

Stev

--- "Rasmus T. Mohr" <[EMAIL PROTECTED]> wrote:
> 
> I've had similar problems with Yahoo. The only
> solution was to change the
> IP-address of the spider host.
> 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On
> > Behalf Of Nick Arnett
> > Sent: Saturday, July 27, 2002 10:54 PM
> > To: Internet robots discussion
> > Subject: [Robots] Does Yahoo have new robot
> defenses?
> >
> >
> >
> > It looks to me as though Yahoo has some sort of
> robot defense
> > operating.  I
> > was just testing a multi-threaded robot that I use
> to analyze
> > discussions,
> > including Yahoo's stock market boards.  On the
> first run, it
> > seemed to do
> > fine, but when I tried to run it again a few
> minutes later, it didn't
> > retrieve anything... so I tried going to the
> message boards
> > using IE on the
> > same machine.  Every page is returning a 403
> Forbidden error
> > now -- even
> > when I try to see robots.txt.  As far as I know,
> Yahoo has
> > never even had a
> > robots.txt file.
> >
> > I'm guessing that the speed of my robot triggered
> a block
> > against this IP
> > address.  Another machine, in the same subnet, can
> access the
> > pages just
> > fine.
> >
> > I've been working on the underlying database for
> the last few
> > weeks, so I
> > haven't run the spider lately.  Thus, I'm not sure
> when this
> > behavior might
> > hvae started.
> >
> > My robot is quite fast and my connection yields
> throughput of about 1
> > mbit/sec, so it certainly hit their server fairly
> hard.  But hey, it's
> > Yahoo.  If they can't handle getting hit this hard
> on a
> > mid-day Saturday,
> > it's hard to imagine who can.
> >
> > No lectures about well-behaved robots, please.  I
> know, I
> > know.  The next
> > step for that robot will be to have each thread
> hit
> > completely different
> > domains.  Perhaps each one will rotate through a
> few domains.
> >
> > Anybody know what Yahoo might be doing, or what
> its policy is
> > about robots?
> > I haven't been able to find anything that
> addresses the issue
> > directly.  I
> > don't see anything under its TOS that would
> clearly apply.
> > If they want to
> > have a limit on robots, I sure would appreciate it
> if they
> > would say what it
> > is...
> >
> > It's been about 30 minutes now and I'm still
> blocked, it seems.
> >
> > Just checked from another machine -- they still
> have no
> > robots.txt at all.
> >
> > Nick
> >
> > --
> > [EMAIL PROTECTED]
> > (408) 904-7198
> >
> >
> >
> 
> 


__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com
_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to