At 1:44 AM -0700 3/2/01, D Salisbury wrote:
>If I have an existing database, leave off the -i, and give htdig
>a completely different start url, why then does it seem to go
>back through the previously indexed pages and tell me "retrieved but not
The default behavior for an "update" is to check if any of the pages
have changed. The "retrieved but not changed" message says that the
server ignored the If-Modified-Since header and sent the document,
but the date returned by the server matches the date in the database.
(In the case of CGIs and the like, the server typically doesn't
return a Last-Modified: header and so either there's no date in the
database, or you've set the modification_time_is_now attribute, in
which case it will use the current date and fetch it every time.)
>or "invalidate" a url that no longer exists? ( perhaps that's what the
>"retrieved but not changed" check is for, but I really don't want it to
>check _every_
URLs that return an error (e.g. a 404) will be removed by htmerge if
you have the remove_bad_urls attribute set. Otherwise, there isn't an
easy way in the 3.1.x series to delete a URL once it's in there. On
the other hand, the new htpurge utility does exactly this for 3.2.
(We listened.)
--
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html