Geoff Hutchison
Fri, 3 Dec 1999 12:40:35 -0800
At 2:15 PM -0600 12/3/99, Gilles Detillieux wrote: >In the 3.2 development code, Geoff hacked it a bit so the initial hopcount >field is set to 0, instead of -1, when DocumentRef and URLRef objects are >first constructed. I don't know if that actually solves this problem or >not, but in any case it doesn't get to the root of the problem: what is >happening to those hopcounts in the first place?! It's not really a hack. First off, a document will only have a hopcount >= 0, so making it -1 doesn't make a lot of sense IMHO. Furthermore, the database seemed to ignore the -1 listed for documents that hadn't been retrieved yet and make up a number. (I kid you not, but I can't remember the exact details. Try doing a dig with a limited server_max_docs and then do an update...) But there are simply a *lot* of issues with hopcounts in 3.1. The biggest problem is that pages are not indexed by hopcount. On an update dig, all the pages that were in the database already are put into the queue in *alphabetical* order, ahead of any new pages. Since the queue is not ordered by hopcount, it's very difficult to ensure the hopcounts are accurate. The indexing queue in 3.2 is based on hopcount--this guarantees that the first time it comes to a page, that was the fastest way it could get there. Furthermore, on updates, any new pages will fall into the queue in the proper place. I don't know whether this has any influence on the particular bug mentioned, but suffice to say that fixing all the problems with hopcount in 3.1 is not going to happen--it would require backporting too much code. I'll stick by the documentation: using -h or max_hop_count is *only* reliable when you're doing an initial dig. Other results may vary. -Geoff ------------------------------------ To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] You will receive a message to confirm this.