[htdig] Avoiding multiple (identical) search results

Ivan Trundle Wed, 17 Mar 1999 19:44:27 -0500

I've been following the threads on indexing only html files, since I have a similar 
problem. I have tried implementing the solution offered by Geoff earlier 
[limit_urls_to: html / and limit_normalized: ${start_url}] but it doesn't seem to work 
for me: I still can't prevent quadruple instances of search results being shown in 
every instance. Have I overlooked something. Here is what I get:

http://www.alia.org.au/
http://www.alia.org.au/home.html
http://www.alia.org.au/alia/
http://www.alia.org.au/alia/home.html
(all leading to the same document)

Two issues arise: Our Apache 1.3.4 server is configured to interpret requests for 
http://www.alia.org.au/ as either ../index.html or ../home.html. How can I stop 
ht://dig from calling up both instances of each interpretation?

The other issue is related, and I suspect both issues are related to a misconfigured 
htdig.conf.

Our server has web documents stored at /usr/local/www/alia/, but visitors should only 
see files from /alia/ inwards (historical reasons, and to allow virtual servers 
alongside in other directories). The URL of http://www.alia.org.au/alia/xyz.html is 
technically possible to serve up, but in reality the .../alia/... component is not 
required, and http://www.alia.org.au/xyz.html is preferred. Can I somehow configure 
ht://dig to only offer the one result? Or is this beyond the scope of ht://dig?

As a matter of interest, I've configured start_url: to http://www.alia.org.au/

Thanks in advance, Ivan
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
[htdig] Avoiding multiple (identical) search results

Reply via email to