Look into webBase, pavuk, wget - there are some other similar free
products out there.
(I am not sure I fully understand/appreciate all your requirements,
though; if you wish, you can clarify them to me.)
We also have web-crawlers which offer more flexibility - but are not
free.

Hope that helps,
Krishna Jha

Mark Friedman wrote:
>
> I am looking for a spider/gatherer with the following characteristics:
>
>    * Enables the control of the crawling process by URL
>      substring/regexp and HTML context of the link.
>    * Enables the control of the gathering (i.e. saving) processes by
>      URL substring/regexp, MIME type, other header information and
>      ideally by some predicates on the HTML source.
>    * Some way to save page/document metadata, ideally in a database.
>    * Freeware, shareware or otherwise inexpensive would be nice.
>
> Thanks in advance for any help.
>
> -Mark

Reply via email to