Re: [Robots] robot in python?

2003-11-17 Thread Walter Underwood
--On Sunday, November 16, 2003 4:23 PM +0100 Petter Karlström [EMAIL PROTECTED] 
wrote:
 
 I have written crawlers in Perl before, but I wish to try out Python for
 a hobby project. Has anybody here written a webbot i Python?

Verity Ultraseek is a web crawler and search engine written in
Python. Portions of it are C or C++ native modules. Ultraseek
is a commercial product, so we don't give out the code. Sorry.

 Python is of course a smaller language, so the libraries aren't as
 extensive as the Perl counterparts. Also, I find the documentation
 somewhat lacking (or it could be me being new to the language).

You may find that the threads and exceptions in Python more than
make up for anything you are missing in Perl. The Python libraries
are not as extensive, but that is mostly because they have one of
everything instead of five or six of everything.

Extracting links using a regular HTML parser works fine, and isn't
that much work. One of the major issues in an HTML parser is
dealing with all the illegal HTML on the web.

wunder
--
Walter Underwood
Principal Architect
Verity Ultraseek

___
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots


Re: [Robots] robot in python?

2003-11-17 Thread Alexander Halavais
Walter Underwood wrote:

Extracting links using a regular HTML parser works fine, and isn't
that much work. One of the major issues in an HTML parser is
dealing with all the illegal HTML on the web.
 

It really depends on what you are looking for, and how tolerant of 
errors you are. For most of what I do, I use the HTML parser, but I have 
also done simple expression matching to pull out links. This tends to 
overestimate the links (e.g., pulling out references in comments, etc.), 
and often yields fragments that are not really followable, but it is at 
least a possibility.

___
Robots mailing list
[EMAIL PROTECTED]
http://www.mccmedia.com/mailman/listinfo/robots