According to Malcolm Austen:
> On Tue, 13 Mar 2001, Malcolm Austen wrote:
> + On Mon, 12 Mar 2001, Gilles Detillieux wrote:
> + + Well, if you have a simple test set of data that produces this problem
> + + in 3.1.5, then please do bore us with it. Even though there have been
> + + substantial changes for 3.2, much of those have been backported to 3.1.5,
> + + so if the problem remains in 3.1.x but not in 3.2.x, I'd like to know what
> + + the cause is, so we can address this if/when we start working on 3.1.6.
>
> Gilles, (and anyone else who cares to try to resolve this!!)
>
> I got worried yesterday afternoon that I was not going to be able to
> reproduce the fault without indexing 20,000 documents. Fortunately I did
> manage it with just one server and (just)under 600 documents.
>
> I have indexed (config file at the end of this message) with a hop count
> of one and then again with a hopcount of two. The result of the second run
> is some 299 documents with hopcounts of 1 that were not indexed in the
> first run. ...
OK, after looking at the files on your site, I was able to reproduce the
problem on my site. (Actually the problem was there all along, I just
didn't know what to look for because hop count isn't an issue for us.)
It turns out that htdig does a depth-order traversal of the document tree,
so really the hop count should always be increasing, never decreasing.
Hunting around in the code, I was able to find out why it was decreasing,
and looking back in earlier versions, I found out when it broke.
It was working in versions 3.0.8b2 and 3.1.0b1, but broken in 3.1.0b2.
In 3.2.0b3, Geoff tried to fix it, but IMHO ended up breaking it even
more, with this patch: "http://www.htdig.org/mail/1998/11/0345.html".
The problem was that in preparation for 3.1.0b2, Geoff made a number of
other hopcount-related changes. These were all good, as far as I can
tell, except for one: inexplicably, he reversed the comparison from:
if (ref->DocHopCount() > currenthopcount + 1)
to:
if (ref->DocHopCount() < currenthopcount + 1)
causing htdig to take the higher hop count rather than the lower one! This
led to the problem which led to the patch referenced above. My fix is
to go back to the way 3.0.8b2 did it, but without losing all the other
good fixes in 3.1.0b2. This fix should be applied to both the 3.1 and
3.2 series.
Geoff, can you verify this change. Your patch of 1998/11 doesn't make
sense to me, but maybe I'm not grasping what your intention really was.
Why should an href in the current document cause the current document's
hop count to drop? It seems any change should be to the referenced
document, not to the current one. Can you let me know if my patch
breaks anything?
Malcolm, and anyone else who experienced hop count problems in 3.1.5, can
you please test this patch and let me know if it fixes your problems and/or
causes new ones?
Use "patch -p0 < this-message-file" in your htdig-3.1.5 source directory...
--- htdig/Retriever.cc.orig Thu Feb 24 20:29:10 2000
+++ htdig/Retriever.cc Tue Mar 20 14:45:24 2001
@@ -1211,11 +1211,8 @@ Retriever::got_href(URL &url, char *desc
return;
}
- if (ref->DocHopCount() != -1 &&
- ref->DocHopCount() < currenthopcount + 1)
- // If we had taken the path through this ref
- // We'd be here faster than currenthopcount
- currenthopcount = ref->DocHopCount(); // So update it!
+ if (ref->DocHopCount() > currenthopcount + 1)
+ ref->DocHopCount(currenthopcount + 1);
docs.Add(*ref);
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html