Well, first off, you probably should put the various Apache FancyIndexing ?C=M (etc.) strings into your exclude_urls patterns. Otherwise, you're getting a slew of almost-duplicate documents.
The htdig and apache are not so familiar to me. Could you clarify your answer?
Take a look at http://www.htdig.org/FAQ.html#q4.23. This FAQ provides a couple solutions to the issue of FancyIndexing.
As for your infinite loop, it does happen. That's why there's work on duplicate detection in 3.2 as well as the max_hop_count attribute:
http://www.htdig.org/attrs.html#max_hop_count
So what is your proposal to prevent the cycles? I can't decrease the max_hop_count 'cause I don't know how deep
my tree will be.
If you can neither fix the problem directly nor limit the dig based on hop count, you might take a look at http://www.htdig.org/attrs.html#exclude_urls. If you happen to know in advance where the problems will occur, you might be able to come up with URL patterns (e.g. repeated bits such as /dir1/dir1/dir1) that can be used to exclude the URLs.
Jim
-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger for complex code. Debugging C/C++ programs can leave you feeling lost and disoriented. TotalView can help you find your way. Available on major UNIX and Linux platforms. Try it free. www.etnus.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

