On Wed, 6 Dec 2000, Curtis Ireland wrote:

> 2) Before htDig starts its database build, dump all the links to a text
> file and have the htdig.conf include this file
>
> The one problem with these two solutions is how would the limit_urls_to
> variable work? I want to make sure the links are properly indexed
> without going past the linked site.

This is the method I used, though in my case the backend was an email full
of links from the person directing the crawl. :)

Write 2 files, one for start_url and one for limit_urls, include both in
the conf file like so:

start_url:              `/home/htdig/conf/start_url_file`

limit_urls_to:          `/home/htdig/conf/limit_url_file`


The contents of both files are just links.

Good Luck,

Bill Carlson
-- 
Systems Programmer    [EMAIL PROTECTED]    |  Opinions are mine,
Virtual Hospital      http://www.vh.org/        |  not my employer's.
University of Iowa Hospitals and Clinics        |


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to