Re: [Robots] Yahoo evolving robots.txt, finally

2004-03-13 Thread Walter Underwood
as 1996. Most sites have pages/day or bytes/day limit, not instantaneous rate limits, so crawl-delay is controlling the wrong thing. Note that Google has implemented Allow lines with a limited wildcard syntax, so Yahoo isn't alone in being incompatible. wunder -- Walter Underwood Principal Architect

Re: [Robots] Yahoo evolving robots.txt, finally

2004-03-12 Thread Walter Underwood
, because Ultraseek reads 25 pages from one site, then moves to another. There are many kinds of rate control. wunder -- Walter Underwood Principal Architect Verity Ultraseek ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman

Re: [Robots] Robots.txt Evolution?

2004-01-11 Thread Walter Underwood
--On Sunday, January 11, 2004 11:44 AM -0500 Fred Atkinson [EMAIL PROTECTED] wrote: I was unaware of the 'Allow' command. Is there a URL that documents it? The Allow directive is non-standard. Don't use it. wunder -- Walter Underwood Principal Architect Verity Ultraseek

Re: [Robots] robot in python?

2003-11-17 Thread Walter Underwood
HTML parser works fine, and isn't that much work. One of the major issues in an HTML parser is dealing with all the illegal HTML on the web. wunder -- Walter Underwood Principal Architect Verity Ultraseek ___ Robots mailing list [EMAIL PROTECTED] http

Re: [Robots] Hit Rate - testing is this mailing linst alive?

2003-11-04 Thread Walter Underwood
. The aggregate spidering rate is higher, because there can be many spider threads making requests. wunder -- Walter Underwood Principal Architect Verity Ultraseek ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots

[Robots] Re: Looksmart's robots.txt file

2002-05-29 Thread Walter Underwood
that sends a particular browser user-agent line (see http://www.cast.org/bobby/). wunder -- Walter Underwood [EMAIL PROTECTED] Senior Staff Engineer, Inktomi http://www.inktomi.com/

[Robots] Re: Robots.txt (was: Hello)

2001-06-11 Thread Walter Underwood
-- Walter Underwood Senior Staff Engineer, Enterprise Search, Inktomi Corp. http://search.inktomi.com/ All Mickey Mouse films are founded on the motif of leaving home in order to learn what fear is. -- Walter Benjamin, 1931 -- This message was sent by the Internet robots and spiders discussion list

[Robots] Re: Robots.txt (was: Hello)

2001-06-11 Thread Walter Underwood
reference for the robots meta tag is the original: http://www.robotstxt.org/wc/meta-user.html wunder -- Walter Underwood Senior Staff Engineer, Enterprise Search, Inktomi Corp. http://search.inktomi.com/ All Mickey Mouse films are founded on the motif of leaving home in order to learn what fear

Re: On Spider Wanted requests

2000-05-10 Thread Walter Underwood
-On Wednesday, May 10, 2000 8:00 PM +0300 Toivio Tuomas [EMAIL PROTECTED] wrote: Nick, would it be much bother if you or somebody else sent a weekly message (say every Monday) informing new people where to find spiders? I remember searchtools.com mentioned; maybe some other good sources too.