On Tue, 20 Mar 2001, Gilles Detillieux wrote:

> It turns out that htdig does a depth-order traversal of the document tree,
> so really the hop count should always be increasing, never decreasing.

<sigh> It's been ages since I caught myself in loops (pun intended) with
hopcounts. Alas, it is not quite so simple. In part, servers can refer to
each other.

www.foo.com -> 1.html -> 2.html
www.bar.com -> www.foo.com/2.html (oops!)

It's complicated because with multiple servers, we don't always do an
exact depth-first search. For example, with 3.2 we can index a few
documents in a row on one server before jumping to another, which is great
for HTTP performance, but... So the servers keep URLs in a priority queue
by hopcount.

In 3.1, URLs are put on the queue in a semi-haphazard fashion. Let's
continue the example above. We put foo.com/1 onto the queue (hop 1), then
go to bar.com and add some URLs, including foo.com/2 (also hop 1). We go
back to foo.com to index 1.html and then we hit the problem in question.

> In 3.2.0b3, Geoff tried to fix it, but IMHO ended up breaking it even
> more, with this patch: "http://www.htdig.org/mail/1998/11/0345.html".

I believe that's 3.1.0b3. And the previous code is obviously wrong as the
example above indicates. (We'd suddenly give 2.html credit for a longer
path!?) But I think I was imagining a non-existent possibility.

> hop count to drop?  It seems any change should be to the referenced
> document, not to the current one.  Can you let me know if my patch
> breaks anything?

Your code is correct, but I still need to think about loops for a bit
more. It's good you brought it up since I see some other cleanups in
there...

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/













_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to