According to Adam Rice:
> I'm having a problem where my search results are out-of-date with
> respect to the site, even though htdig is definitely running, and
> definitely fetching the files from the web server, and not giving
> errors. Perhaps I am misunderstanding what an update dig does? I thought
> that it checked every document in its database, and rescanned it if it
> was new, as well as following any links to new documents, and removing
> it if it gets a 404.
>
> I run htdig and htmerge with the -a commandline options. I then move the
> *.docdb.work, *.docs.index.work and *.words.db.work files to *.docdb,
> *.docs.index.work and *.words.db respectively. I don't actually use
> wildcards, the *s are just there because I have different databases for
> different sites. I then copy the *.docdb file back to *.docdb.work so
> that it is there for the next update dig. The *.wordlist.work file is
> left alone ready for the next update.
>
> Does that procedure sound correct? All the pages on the sites use
> server-side includes, and hence don't have Last-Modified: headers, could
> that be confusing matters?
The procedure above sounds correct to me, but for dynamic content with no
Last-Modifed headers, you need to set
modification_time_is_now: true
in your configuration file for 3.1.x. In 3.2.0, this attribute is gone,
and htdig always assumes the current time for any missing Last-Modified
header.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>