According to Geoff Hutchison:
> At 2:15 PM -0600 12/3/99, Gilles Detillieux wrote:
> >In the 3.2 development code, Geoff hacked it a bit so the initial hopcount
> >field is set to 0, instead of -1, when DocumentRef and URLRef objects are
> >first constructed.  I don't know if that actually solves this problem or
> >not, but in any case it doesn't get to the root of the problem: what is
> >happening to those hopcounts in the first place?!
> 
> It's not really a hack. First off, a document will only have a 
> hopcount >= 0, so making it -1 doesn't make a lot of sense IMHO. 
> Furthermore, the database seemed to ignore the -1 listed for 
> documents that hadn't been retrieved yet and make up a number. (I kid 
> you not, but I can't remember the exact details. Try doing a dig with 
> a limited server_max_docs and then do an update...)

I know!  That's what I was seeing myself.  It was often -1, but sometimes
it was 255.  This makes me wonder if it's not some odd bug in Serialize
or Deserialize.  The wierd thing is when I look at the code, it always
seems to set the hopcount explicitly after creating a new DocumentRef
object, so I can't see why it would ever fall back to the constructor's
default value.  The fact that it is makes me suspect that something
is going terribly wrong, and I think it's in the database.  I called
changing the constructor's default a hack because it just conceals the
-1 that was staring at us before, which I see as an error indication.

> But there are simply a *lot* of issues with hopcounts in 3.1. The 
> biggest problem is that pages are not indexed by hopcount. On an 
> update dig, all the pages that were in the database already are put 
> into the queue in *alphabetical* order, ahead of any new pages. Since 
> the queue is not ordered by hopcount, it's very difficult to ensure 
> the hopcounts are accurate.
> 
> The indexing queue in 3.2 is based on hopcount--this guarantees that 
> the first time it comes to a page, that was the fastest way it could 
> get there. Furthermore, on updates, any new pages will fall into the 
> queue in the proper place.
> 
> I don't know whether this has any influence on the particular bug 
> mentioned, but suffice to say that fixing all the problems with 
> hopcount in 3.1 is not going to happen--it would require backporting 
> too much code. I'll stick by the documentation: using -h or 
> max_hop_count is *only* reliable when you're doing an initial dig. 
> Other results may vary.

I realize that there are many other hopcount related changes in 3.2,
and no, I don't intend to backport them all, but the reason I'm caught
up on this particular problem is that it seems to me to be a symptom of
a deeper underlying problem.  If I can rule out that it is, I'm OK with
leaving this as-is, but if I uncover something nastier, I'd see that as
reason enough to delay 3.1.4 for a day or two - if a solution is in sight.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to