Sorry, I confused -a with -i in my previous message. What I meant was:

I just noticed that the nightly update dig of one of my mailing
list archives (produced my mhonarc) misses new messages added to
the archive. I have a suspicion why this happens. Can anybody confirm
this?

In the archive the index pages (thread and date index) have the meta
tag <META name=robots content="noindex,follow">
because I don't want these pages to show up in search results themselves,
but they are the (only) way htdig can find all messages. From looking
at the debug output of the nightly update digs, it seems that htdig
without -i just checks all documents in the database for changes.
But the index pages are excluded from the database, so it never checks
those for new links.

Shouldn't it either traverse the document space starting with the
start URLs also for update digs (in the same way it does with an initial
dig), or maybe keep a list of excluced documents and check those as well?
That is, while an exluded document should not be used by htsearch, it
should still be checked by an update dig.

Roman Maeder


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to