[Robots] Anti-thesaurus proposal

2001-11-20 Thread Nick Arnett
there's the seed of a good idea in it. However, it seems to me that if the authors of a page would actually bother to create meta-tags to increase search efficiency, it would be much easier (semi-automated, even) to create a tag containing the *most* relevant words, not the least. Nick Arnett

[Robots] FW: Re: Correct URL, shlash at the end ?

2001-11-24 Thread Nick Arnett
-Original Message- From: Sean 'Captain Napalm' Conner [mailto:[EMAIL PROTECTED]] Sent: Friday, November 23, 2001 11:26 PM To: [EMAIL PROTECTED] Subject: Re: [Robots] Re: Correct URL, shlash at the end ? It was thus said that the Great George Phillips once stated: Don't be mislead

[Robots] FW: Re: Correct URL, shlash at the end ?

2001-11-24 Thread Nick Arnett
-Original Message- From: Sean 'Captain Napalm' Conner [mailto:[EMAIL PROTECTED]] Sent: Friday, November 23, 2001 11:26 PM To: [EMAIL PROTECTED] Subject: Re: [Robots] Re: Correct URL, shlash at the end ? It was thus said that the Great George Phillips once stated: Don't be mislead

Re: Rumorbot

2001-02-03 Thread Nick Arnett
their tech and company (also attended the session). Alex At 12:10 PM 02/02/2001 -0800, Nick Arnett wrote: Anyone know more about this company or project...? http://news.bbc.co.uk/hi/english/sci/tech/newsid_1146000/1146589.stm Nick Arnett Sr. VP and Co-Founder Opion Inc. Direct phone/fax: 408-733-7613

[no subject]

2002-02-21 Thread Nick Arnett
, 31 Oct 2000 15:48:21 -0800 Reply-To: [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] From: Nick Arnett [EMAIL PROTECTED] Subject: Robots, km lists back up Comments: To: [EMAIL PROTECTED], [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Tue, 31 Oct 2000 23:54:17 -0800 Content-transfer-encoding: 7bit x

Robot list bounces

2001-03-08 Thread Nick Arnett
of bounces that show addresses that are not subscribed to the list... so if you see a few bounces when you post (most come here, as they should), that may be the reason. Nick Arnett Sr. VP and Co-Founder Opion Inc. Direct phone: 408-733-7613 Fax: 408-904-7198 http://www.opion.com

[no subject]

2002-02-21 Thread Nick Arnett
From [EMAIL PROTECTED] Fri Nov 10 14: 47:29 2000 Received: by mccmedia.com from localhost (router,SLMail V2.7); Fri, 10 Nov 2000 14:47:29 -0800 Received: by mccmedia.com from mail2 (209.133.89.19::mail daemon; unverified,SLMail V2.7); Fri, 10 Nov 2000 14:47:26 -0800 Received: from

[Robots] Re: SV: matching and User-Agent: in robots.txt

2002-03-14 Thread Nick Arnett
Certainly LWP is widely used, but I think it's an open question as to how many LWP users use the robots.txt capabilities. I have used LWP extensively, but have never bothered with the latter. My robots target a handful of sites and really don't recurse, as such, so I just keep an eye on those

[Robots] Re: better language for writing a Spider ?

2002-03-14 Thread Nick Arnett
Having worked in Perl and Python, I'll recommend Python. Although I haven't been using it for long, I'm definitely more productive with it. Performance seems fine, though I haven't really pushed hard on it. I'm not seeing long, mysterious time-outs as I occasionally did with LWP. And I hit

[Robots] Re: matching and UserAgent: in robots.txt

2002-03-14 Thread Nick Arnett
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Sean M. Burke ... E.g., http://www.robotstxt.org/wc/norobots.html says: User-agent [...] The robot should be liberal in interpreting this field. A case insensitive substring match of the name

[Robots] Python timeouts

2002-03-25 Thread Nick Arnett
I've been hitting problems with a Python-based robot I'm working on and just found out that there's a timeout module that will make it easy to implement the kind of functionality that Tim Bray was suggesting here earlier. It apparently works for any TCP connection. Here's the link:

[Robots] Re: unsubscibe

2002-03-26 Thread Nick Arnett
Commands need to be send to [EMAIL PROTECTED]. Send unsubscribe robots in the body of a message to leave this list. Nick -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of HuiFang Wang Sent: Tuesday, March 26, 2002 2:30 AM To: [EMAIL PROTECTED]

[Robots] Does Yahoo have new robot defenses?

2002-07-27 Thread Nick Arnett
It looks to me as though Yahoo has some sort of robot defense operating. I was just testing a multi-threaded robot that I use to analyze discussions, including Yahoo's stock market boards. On the first run, it seemed to do fine, but when I tried to run it again a few minutes later, it didn't

[Robots] Safe parameters for spidering Yahoo message header pages?

2002-08-02 Thread Nick Arnett
Anyone here figured out what Yahoo will tolerate in terms of spidering its message header pages before it blocks the robot's IP address? Before I start testing, I figured I'd see if anyone else here has already done so. The duration of the block seems to lengthen, so testing could take a while.

RE: [Robots] Post

2002-11-08 Thread Nick Arnett
essentially creating a toolbox with Python and MySQL, which I'm using to create custom information products for consulting clients. For the moment, those (obviously) are companies with a strong interest in Java. Nick -- Nick Arnett Phone/fax: (408) 904-7198 [EMAIL PROTECTED

RE: [Robots] Efficient crawling of mailing list archives?

2003-02-28 Thread Nick Arnett
At the risk of talking to myself... Would a gateway from mailing lists to NNTP address most of the issues I described? NNTP already knows about threading, updating, etc. However, I've been stymied by the problem of discovering new NNTP servers. -- Nick Arnett Phone/fax: (408) 904-7198 [EMAIL

Re: [Robots] Is this mailing linst alive?

2003-11-04 Thread Nick Arnett
[EMAIL PROTECTED] wrote: I've created a robot, www.dead-links.com and i wonder if this list is alive. It is alive, but very, very quiet. Nick ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots

Re: [Robots] robot in python?

2003-11-26 Thread Nick Arnett
need recursion. It's about 400 lines. A lot of it deals with things like missing messages, zeroing in on desired date ranges, avoiding downloading huge messages, recovery from failure, etc. All of these talk to MySQL... Nick -- Nick Arnett Phone/fax: (408) 904-7198 [EMAIL PROTECTED

Re: [Robots] Yahoo evolving robots.txt, finally

2004-03-15 Thread Nick Arnett
what they'd consider acceptable. And yet, their own servers don't seem to have a robots.txt that defines any limitations. Sure would be nice if *they* would tell *us* what's acceptable when crawling Yahoo! Nick -- Nick Arnett Director, Business Intelligence Services LiveWorld Inc. Phone/fax

[Robots] [Fwd: add-robot@robotstxt.org is not working]

2004-04-06 Thread Nick Arnett
modified-by: UptimeBot team Best regards. Maks (aka Luft) -- Nick Arnett Director, Business Intelligence Services LiveWorld Inc. Phone/fax: (408) 551-0427 [EMAIL PROTECTED] ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo