According to Ivan C Chang:
> After htdig finishes indexing my site, I discovered that some of the URLs
> are duplicated with the following characteristics:
> 
> http://www.myschool.edu./
> 
> i.e. besides indicing http://www.myschool.edu, there's an extra . there in
> the above URL this happens for every descendant link that follows, as a
> result there are large numbers of duplicates.  I tried to locate if there
> are pages that mistakenly contain links of the form
> http://www.myschool.edu. explicitly but haven't found yet.
> 
> Isn't htdig smart enough to remove the . during the normalization process?
> How could I deal with this problem?

All it takes is one, to make htdig traverse what it thinks is an entirely
different hierarchy on another server.  You could try a server_aliases
attribute setting like this:

server_aliases: www.myschool.edu.:80=www.myschool.edu:80

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to