[htdig] subsequent digs

Tim Perdue, Geocrawler.com Tue, 27 Apr 1999 16:36:51 -0700

I have over 1.6 millions pages on my site, and ht://dig wants to reindex
*all* of them every time it digs.

I tried setting up a page that only includes *new* links for it to dig, but
it goes ahead and digs all the old links in its database as well.

I am *not* using the -i option.

Why won't it just dig the new links and add those pages to the database?
It's totally impractical to have it reindex the entire web site everyday (in
fact, it takes 4 days for each dig).

Dig command:

/atlas18gb/htdig/bin/htdig -c /atlas18gb/htdig/conf/1.conf -s >>
/atlas18gb/htdig/1.db/dig.log

This is my 1.conf, excluding the .gif stuff:

----start----

database_dir:        /atlas18gb/htdig/1.db
start_url:           http://db.geocrawler.com/archives/3/1/
limit_urls_to:       http://db.geocrawler.com/archives/3/1/
backlink_factor:     0
sort:                score

limit_urls_to:   <<--- OK I'll fix this.
exclude_urls:           /cgi-bin/ .cgi
maintainer:             [EMAIL PROTECTED]
max_head_length:        10000
#server_wait_time:      1
max_doc_size:           1500000
search_algorithm:       exact:1 synonyms:0.5 endings:0.1

----end----

Thanks! ht://dig is working really well, if I can just get rid of these last
few glitches!

Tim Perdue
PHPBuilder.com / GotoCity.com / Geocrawler.com



------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
[htdig] subsequent digs

Reply via email to