Re: [Robots] robot in python?
SsolSsinclair wrote: Walter: You may find that the threads and exceptions in Python more than make up for anything you are missing in Perl. The Python libraries are not as extensive, but that is mostly because they have one of everything instead of five or six of everything. SsolSsinclair: this conclusional statement [above] comes from Walter which outlines the coding advantages of using Python. A person capable of inventing these statements on the spot would know them to be true. I am unclear, therefore, why Portions of Verity UltraSeek [a commercial product] would need to use C or C++ modules. Well, Walter compared Python and Perl not Python and C or C++. I can see why portions of a bot would be written in C or C++. Performance issues would perhaps not be too wild a guess. .I am unclear and have an uninformed opinion about the intent of "finding" web-pages in which the original author did not wish to b e found. i, cannot perceive there being any possibility these pages to be of any interest to the general public, or to the corporate citizenship. However, this statement by Walter: This tends to overestimate the links (e.g., pulling out references in comments, etc.), and often yields fragments that are not really followable, but it is at least a possibility. seems to indicate there is a difference between the # o9f pages retrieved, and the "possible number of pages that could be retrieved". This numerical difference is QUITE noteable, and of interest to competing coders. It is, however, A privately held #. Thanks Alex. Sorry, but what you're discussing is a quite different matter than the discussion you're quoting. Overestimation by for example regexps is just that the bot may, for example, mistakingly store some things as tags that really aren't. Overestimating what you may need in more general terms is a different (alebeit interesting) matter. Primary concern of Petter::Still wunder how to handle logins, though... This is really your determination to make. Taking anyone's opinion on the matter would end in your system being less secure. I would guess some method of encryptian. I am not really clear on why the need of a PassWrdMgr is necessary with the development of a Search Engine, crawler. This system should really already be in place in your work environment. Maybe even a firewall? The reason I need some password management is that my app has to login to a secure site. Encryption would certainly be nice! /Petter ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots
RE: [Robots] robot in python?
BUSH CALLS ON SENATE TO RATIFY CYBERCRIME TREATY President Bush has asked the US Senate to ratify the first international cybercrime treaty. Bush called the Council of Europe's controversial treaty "an effective tool in the global effort to combat computer-related crime" and "the only multilateral treaty to address the problems of computer-related crime and electronic evidence gathering." http://news.com.com/2100-1028_3-5108854.html this comment looks to effect the coding environment. Appreciate comments on pertinence. I don't think Bush has time, however, to spend developing code, however. ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots
RE: [Robots] robot in python?
developing a crawler for a search engine with Python == You may find that the threads and exceptions in Python more than make up for anything you are missing in Perl. The Python libraries are not as extensive, but that is mostly because they have one of everything instead of five or six of everything. Extracting links using a regular HTML parser works fine, and isn't that much work. One of the major issues in an HTML parser is dealing with all the illegal HTML on the web. this conclusional statement [above] comes from Walter which outlines the coding advantages of using Python. A person capable of inventing these statements on the spot would know them to be true. I am unclear, therefore, why Portions of Verity UltraSeek [a commercial product] would need to use C or C++ modules. Has anybody here written a webbot in Python? Answer Verity Ultraseek is a web crawler and search engine written in Python. Portions of it are C or C++ native modules. Ultraseek is a commercial product, so we don't give out the code. Sorry. from: Alexander Halavais It really depends on what you are looking for, and how tolerant of errors you are. For most of what I do, I use the HTML parser, but I have also done simple expression matching to pull out links. This tends to overestimate the links (e.g., pulling out references in comments, etc.), and often yields fragments that are not really followable, but it is at least a possibility. .I am unclear and have an uninformed opinion about the intent of "finding" web-pages in which the original author did not wish to b e found. i, cannot perceive there being any possibility these pages to be of any interest to the general public, or to the corporate citizenship. However, this statement by Walter: This tends to overestimate the links (e.g., pulling out references in comments, etc.), and often yields fragments that are not really followable, but it is at least a possibility. seems to indicate there is a difference between the # o9f pages retrieved, and the "possible number of pages that could be retrieved". This numerical difference is QUITE noteable, and of interest to competing coders. It is, however, A privately held #. Thanks Alex. Specifically, I need something like the linkextor available in Perl. petter wrote:: Yes, in fact I found some very good examples on the website "Dive Into Python", including how to do a linkextor. Quite simple. http://diveintopython.org/html_processing/extracting_data.html This uses SGMLParser which presumably is more tolerant on illegal HTML. Primary concern of Petter::Still wunder how to handle logins, though... This is really your determination to make. Taking anyone's opinion on the matter would end in your system being less secure. I would guess some method of encryptian. I am not really clear on why the need of a PassWrdMgr is necessary with the development of a Search Engine, crawler. This system should really already be in place in your work environment. Maybe even a firewall? .Ssol>. Digital Acquizitionatory Inventory Drive Imaging >Sol ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots
Re: [Robots] robot in python?
Walter Underwood wrote: Python is of course a smaller language, so the libraries aren't as extensive as the Perl counterparts. Also, I find the documentation somewhat lacking (or it could be me being new to the language). You may find that the threads and exceptions in Python more than make up for anything you are missing in Perl. The Python libraries are not as extensive, but that is mostly because they have one of everything instead of five or six of everything. Yup, that's why I'm learning Python! I got tired of the "after the fact" object orientation and the sometimes maddening syntax of Perl. Extracting links using a regular HTML parser works fine, and isn't that much work. One of the major issues in an HTML parser is dealing with all the illegal HTML on the web. Yes, in fact I found some very good examples on the website "Dive Into Python", including how to do a linkextor. Quite simple. http://diveintopython.org/html_processing/extracting_data.html This uses SGMLParser which presumably is more tolerant on illegal HTML. Still wonder how to handle logins, though... /petter ___ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots