According to Malcolm Austen:
> I may have read the documentation too optimistically, so I'll offer this
> for debate initially rather than going straight in as a bug report!
I appreciate that! The bug database isn't a good forum for discussion.
When in doubt, the list is a better choice.
> The description of remove_default_doc: fails to convey to me the
> 'feature' that the config line -
>
> remove_default_doc: index.htm
>
> will also strip out index.html ... hey it's a feature?
Are you sure about this? The URL::removeIndex() method uses the
StringMatch::CompareWord() method to do the actual string comparisons,
so if a pattern of index.htm actually matches index.html, then this would
suggest a bug in CompareWord(), which is an unsettling prospect indeed.
Can you please confirm this?
> Ah, no! Unfortunatley we have a server running AxKit and delivering
> index.xml as the default document, so I made my config line -
>
> remove_default_doc: index.html index.htm index.xml
>
> and to my horror htdig is no longer indexing pages that are not the
> default document like 'index.xml.ID=something'
Ah, that would be a bug! removeIndex() makes an exception for query
strings beginning with "?", but doesn't test for other things that may
be appended to the file name. It should.
> Perhaps 3.2 will resolve the issue by allowing me to specify the default
> document name via a regexp? In the meantime is there any chance that 3.1.6
> could either fix the removal to only take out documents that _exactly_
> match, or introduce an option to switch the match to be exact?
3.2's removeIndex() method is almost identical to 3.1's, so no, it won't
handle regex and it won't solve this problem (yet).
> Meanwhile I think I have no option but to allow the duplicate indexing of
> .../ and .../index.xml - any ideas on how to work around the problem
> would be appreciated ...
Please try out the following patch, but only after confirming that
the unpatched code does indeed allow a pattern of index.htm to match
index.html. I haven't tested the patch myself, so I'll need to know
from you if it fixes the problem or breaks something else. Please let
me know how it goes.
Apply this in your 3.1.5 or 3.1.6 snapshot main source directory, using
"patch -p0 < this-message".
--- htlib/URL.cc.orig Thu Sep 27 17:02:10 2001
+++ htlib/URL.cc Thu Oct 11 08:59:16 2001
@@ -477,8 +477,10 @@ void URL::removeIndex(String &path)
defaultdoc->Pattern(l.Join('|'));
l.Release();
}
+ int which, length;
if (defaultdoc->hasPattern() &&
- defaultdoc->CompareWord(path.sub(filename)))
+ defaultdoc->CompareWord(path.sub(filename), &which, &length) &&
+ filename+length == path.length())
path.chop(path.length() - filename);
}
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html