Look into webBase, pavuk, wget - there are some other similar free products out there. (I am not sure I fully understand/appreciate all your requirements, though; if you wish, you can clarify them to me.) We also have web-crawlers which offer more flexibility - but are not free.
Hope that helps, Krishna Jha Mark Friedman wrote: > > I am looking for a spider/gatherer with the following characteristics: > > * Enables the control of the crawling process by URL > substring/regexp and HTML context of the link. > * Enables the control of the gathering (i.e. saving) processes by > URL substring/regexp, MIME type, other header information and > ideally by some predicates on the HTML source. > * Some way to save page/document metadata, ideally in a database. > * Freeware, shareware or otherwise inexpensive would be nice. > > Thanks in advance for any help. > > -Mark