Hi All,

I am in need of an (open-source) web crawler (a-la wget), but one that does
all of the following:
1. Performs breadth-first search, not depth-first search. (so stopping
condition based of disk space will give a wide crawl, rather than a deep
crawl).
2. Can let me defined whether to recurse into a link or not, based on
criteria (e.g. leaving domain or not being the most obvious, but also by
regexping the url etc.)
3. optimally should allow me to provide a lambda function that will return a
rating based on page content, so I decide whether to recurse and where to
avoid.

Anyone?

I will write such a thing, if none is found, but really prefer not to.

Shachar Tal
Verint Systems




This electronic message contains information from Verint Systems, which may
be privileged and confidential.  The information is intended to be for the
use of the individual(s) or entity named above.  If you are not the intended
recipient, be aware that any disclosure, copying, distribution or use of the
contents of this information is prohibited.  If you have received this
electronic message in error, please notify us by replying to this email.

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to