On Wed, 6 Dec 2000, Curtis Ireland wrote:
> 2) Before htDig starts its database build, dump all the links to a text
> file and have the htdig.conf include this file
>
> The one problem with these two solutions is how would the limit_urls_to
> variable work? I want to make sure the links are properly indexed
> without going past the linked site.
This is the method I used, though in my case the backend was an email full
of links from the person directing the crawl. :)
Write 2 files, one for start_url and one for limit_urls, include both in
the conf file like so:
start_url: `/home/htdig/conf/start_url_file`
limit_urls_to: `/home/htdig/conf/limit_url_file`
The contents of both files are just links.
Good Luck,
Bill Carlson
--
Systems Programmer [EMAIL PROTECTED] | Opinions are mine,
Virtual Hospital http://www.vh.org/ | not my employer's.
University of Iowa Hospitals and Clinics |
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>