According to [EMAIL PROTECTED]:
> The site I am indexing is a bit peculiar. The following
> is an example of the setup, where each page is exactly
> the same.
>
> www.domain.com/subdirectory/
> www.domain.com/subdirectory/index.html
> www.domain.com/Subdirectory/
> www.domain.com/Subdirectory/index.html
>
> I assumed that in the case where there is no index.html
> that it was just loading the index.html. Here's the
> problem. htdig recognizes this as 4 different pages,
> and indexes all of them. I can see where it would think
> it is 2 different because of the s and S. Is there any
> way to prevent the duplicates?
The remove_default_doc attribute should take care of the superfluous
"index.html" entries, but I'm not so sure about the extra Subdirectory
names. You can't use exclude_urls for this, because it does a case
insensitive match.
On my site, I make use of a few symbolic links for subdirectories, to
give an all-lowercase equivalent to some mixed case names, but I never
use these in URLs on my site, for this very reason. I only use them to
support links from other sites, where other admins may be a tad sloppy
about getting the case right. I realise this isn't a workable alternative
for you if you don't maintain control over the whole site you're indexing.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>