> > http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html
> > 
> > Time to update your robots.txt parsers!
> 
> No, time to tell Yahoo to go back and do a better job.

They made the first step.  Think web browsers 10 years ago, standards,
and non-standard extensions.

> Does crawl-delay allow decimals?

You think people really want to be able to tell a crawler to fetch a
page at most every 5.6 seconds, and not 5?

> Negative numbers?

What would that do?  Prevent crawling?  Disallow?

> Could this spec be a bit better quality?

It's not a spec, it's an implementation, and they exposed it to the
masses first, even if other web crawlers had the ability to do this all
along.

> The words "positive integer" would improve things a lot.

That's just common sense to me. :)

> Sigh. It would have been nice if they'd discussed this on the
> list first. "crawl-delay" is a pretty dumb idea. Any value over
> one second means it takes forever to index a site.

I am sure their people are on the list, they are just being quiet, and
will probably remain silent now that their idea hsa been called dumb.

You have a good point with the second sentence.

> Ultraseek 
> has had a "spider throttle" option to add this sort of delay,
> but it is almost never used, because Ultraseek reads 25 pages
> from one site, then moves to another. There are many kinds of
> rate control.

I believe the same will happen to Yahoo's crawler and their extension.
Webmasters will see it, add it to their robots.txt with some
unacceptable values.
Yahoo will have to override the specified values if they want to
compete with others.  The syntax will stick in robots.txt, but will be
useless, just you describe it in the Ultraseek's case.

Otis

_______________________________________________
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots

Reply via email to