Greetings Neal,

On Sat, 4 Oct 2003 11:00, Neal Richter wrote:
> If the timestamps are the same we don't bother to download it.
>
> > I think you misinterpreted what Lachlan suggested, i.e. the case
> > where Y does NOT change.  If Y is the only document with a link
> > to X, and Y does not change, it will still have the link to X, so
> > X is still "valid". However, if Y didn't change, and htdig
> > (without -i) doesn't reindex Y, then how will it find the link to
> > X to validate X's presence in the db?
>
>   Changing Y is the point!

Agreed, changing Y is what triggers the current bug.  However, I 
believe that a simple fix of the current bug will introduce a *new* 
bug for the more common case that Y *doesn't* change.  Reread 
Gilles's scenario and try to answer his question.  I'd explain it 
more clearly, but I don't have a napkin handy :)

If we get around to implementing Google's link analysis, as Geoff 
suggested, then we may be able to fix the problem properly.  It seems 
that any fix will have to look at all links *to* a page, and then 
mark as "obsolete" those *links* where (a) the link-from page ("Y") 
is changed and (b) it no longer contains the link.  After the dig, 
all pages must be checked (in the database), and those with no links 
which are not obsolete can themselves be marked as obsolete.

> However I would strongly recommend we enable head_before_get by
> default. We're basically wasting bandwidth like drunken sailors
> with it off!!!

Good suggestion.  If we want some code bloat, we could have an "auto" 
mode, which would use  head_before_get  unless  -i  is specified, but 
not when  -i  is specified (since we'll always have to do the "get" 
anyway)...

Cheers,
Lachlan

-- 
[EMAIL PROTECTED]
ht://Dig developer DownUnder  (http://www.htdig.org)


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to