there's the
seed of a good idea in it. However, it seems to me that if the authors of a
page would actually bother to create meta-tags to increase search
efficiency, it would be much easier (semi-automated, even) to create a tag
containing the *most* relevant words, not the least.
Nick Arnett
-Original Message-
From: Sean 'Captain Napalm' Conner [mailto:[EMAIL PROTECTED]]
Sent: Friday, November 23, 2001 11:26 PM
To: [EMAIL PROTECTED]
Subject: Re: [Robots] Re: Correct URL, shlash at the end ?
It was thus said that the Great George Phillips once stated:
Don't be mislead
-Original Message-
From: Sean 'Captain Napalm' Conner [mailto:[EMAIL PROTECTED]]
Sent: Friday, November 23, 2001 11:26 PM
To: [EMAIL PROTECTED]
Subject: Re: [Robots] Re: Correct URL, shlash at the end ?
It was thus said that the Great George Phillips once stated:
Don't be mislead
their tech and company (also attended the session).
Alex
At 12:10 PM 02/02/2001 -0800, Nick Arnett wrote:
Anyone know more about this company or project...?
http://news.bbc.co.uk/hi/english/sci/tech/newsid_1146000/1146589.stm
Nick Arnett
Sr. VP and Co-Founder
Opion Inc.
Direct phone/fax: 408-733-7613
, 31 Oct 2000 15:48:21 -0800
Reply-To: [EMAIL PROTECTED]
Sender: [EMAIL PROTECTED]
From: Nick Arnett [EMAIL PROTECTED]
Subject: Robots, km lists back up
Comments: To: [EMAIL PROTECTED], [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Date: Tue, 31 Oct 2000 23:54:17 -0800
Content-transfer-encoding: 7bit
x
of bounces that show addresses that are
not subscribed to the list... so if you see a few bounces when you post
(most come here, as they should), that may be the reason.
Nick Arnett
Sr. VP and Co-Founder
Opion Inc.
Direct phone: 408-733-7613 Fax: 408-904-7198
http://www.opion.com
From [EMAIL PROTECTED] Fri Nov 10 14: 47:29 2000
Received: by mccmedia.com from localhost
(router,SLMail V2.7); Fri, 10 Nov 2000 14:47:29 -0800
Received: by mccmedia.com from mail2
(209.133.89.19::mail daemon; unverified,SLMail V2.7); Fri, 10 Nov 2000 14:47:26
-0800
Received: from
Certainly LWP is widely used, but I think it's an open question as to how
many LWP users use the robots.txt capabilities. I have used LWP
extensively, but have never bothered with the latter. My robots target a
handful of sites and really don't recurse, as such, so I just keep an eye on
those
Having worked in Perl and Python, I'll recommend Python. Although I haven't
been using it for long, I'm definitely more productive with it. Performance
seems fine, though I haven't really pushed hard on it. I'm not seeing long,
mysterious time-outs as I occasionally did with LWP. And I hit
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Sean M. Burke
...
E.g., http://www.robotstxt.org/wc/norobots.html says:
User-agent [...] The robot should be liberal in interpreting
this field.
A case insensitive substring match of the name
I've been hitting problems with a Python-based robot I'm working on and just
found out that there's a timeout module that will make it easy to implement
the kind of functionality that Tim Bray was suggesting here earlier. It
apparently works for any TCP connection. Here's the link:
Commands need to be send to [EMAIL PROTECTED].
Send unsubscribe robots in the body of a message to leave this list.
Nick
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of HuiFang Wang
Sent: Tuesday, March 26, 2002 2:30 AM
To: [EMAIL PROTECTED]
It looks to me as though Yahoo has some sort of robot defense operating. I
was just testing a multi-threaded robot that I use to analyze discussions,
including Yahoo's stock market boards. On the first run, it seemed to do
fine, but when I tried to run it again a few minutes later, it didn't
Anyone here figured out what Yahoo will tolerate in terms of spidering its
message header pages before it blocks the robot's IP address? Before I
start testing, I figured I'd see if anyone else here has already done so.
The duration of the block seems to lengthen, so testing could take a while.
essentially creating a toolbox with Python and MySQL, which I'm using to
create custom information products for consulting clients. For the moment,
those (obviously) are companies with a strong interest in Java.
Nick
--
Nick Arnett
Phone/fax: (408) 904-7198
[EMAIL PROTECTED
At the risk of talking to myself... Would a gateway from mailing lists to
NNTP address most of the issues I described? NNTP already knows about
threading, updating, etc.
However, I've been stymied by the problem of discovering new NNTP servers.
--
Nick Arnett
Phone/fax: (408) 904-7198
[EMAIL
[EMAIL PROTECTED] wrote:
I've created a robot, www.dead-links.com and i wonder if this list is alive.
It is alive, but very, very quiet.
Nick
___
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots
need recursion. It's about 400 lines. A lot of it deals with things
like missing messages, zeroing in on desired date ranges, avoiding
downloading huge messages, recovery from failure, etc.
All of these talk to MySQL...
Nick
--
Nick Arnett
Phone/fax: (408) 904-7198
[EMAIL PROTECTED
what they'd consider acceptable.
And yet, their own servers don't seem to have a robots.txt that defines
any limitations. Sure would be nice if *they* would tell *us* what's
acceptable when crawling Yahoo!
Nick
--
Nick Arnett
Director, Business Intelligence Services
LiveWorld Inc.
Phone/fax
modified-by: UptimeBot team
Best regards.
Maks (aka Luft)
--
Nick Arnett
Director, Business Intelligence Services
LiveWorld Inc.
Phone/fax: (408) 551-0427
[EMAIL PROTECTED]
___
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo
20 matches
Mail list logo