I may have read the documentation too optimistically, so I'll offer this
for debate initially rather than going straight in as a bug report!
The description of remove_default_doc: fails to convey to me the
'feature' that the config line -
remove_default_doc: index.htm
will also strip out index.html ... hey it's a feature?
Ah, no! Unfortunatley we have a server running AxKit and delivering
index.xml as the default document, so I made my config line -
remove_default_doc: index.html index.htm index.xml
and to my horror htdig is no longer indexing pages that are not the
default document like 'index.xml.ID=something'
Here's a fragment of -vv output which shows the pages being ignored:
<fragment>
*A tag: pos = 2, position = ="oxford/" class="quicklink" target="_top">
pushing http://spqr.oucs.ox.ac.uk/email/oxford/
+A tag: pos = 30, position = ="index.xml.ID=herald">
A tag: pos = 2, position = ="access/" class="quicklink" target="_top">
pushing http://spqr.oucs.ox.ac.uk/email/access/
+A tag: pos = 2, position = ="ssl/" class="quicklink" target="_top">
</fragment>
Perhaps 3.2 will resolve the issue by allowing me to specify the default
document name via a regexp? In the meantime is there any chance that 3.1.6
could either fix the removal to only take out documents that _exactly_
match, or introduce an option to switch the match to be exact?
Meanwhile I think I have no option but to allow the duplicate indexing of
.../ and .../index.xml - any ideas on how to work around the problem
would be appreciated ...
regards,
Malcolm.
+
| Malcolm Austen, Tel: +44(0) 1865 273216
| Oxford University Computing Services, Fax: +44(0) 1865 273275
| 13 Banbury Road, Email - [EMAIL PROTECTED]
| Oxford, OX2 6NN, England WWW - http://users.ox.ac.uk/~malcolm/
+
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html