This is addition to Marc's post. Our main issue is that htdig starts cycle when it tries to index directory where
is soft link itself and it seems never stop. Here is few log and some infos our environments.
13:11:2:http://localhost/htdig_sltest/sl/?C=N&O=D: ******* size = 774
14:16:2:http://localhost/htdig_sltest/sl/test_gcs_tnx.html: size = 1040
15:12:2:http://localhost/htdig_sltest/sl/?C=M&O=A: *+***** size = 774
16:13:2:http://localhost/htdig_sltest/sl/?C=S&O=A: **+**** size = 774
17:27:3:http://localhost/htdig_sltest/sl/?C=M&O=D: ******* size = 774
18:28:3:http://localhost/htdig_sltest/sl/?C=S&O=D: ******* size = 774
19:20:3:http://localhost/htdig_sltest/sl/sl/?C=M&O=A: ++***** size = 777
Well, first off, you probably should put the various Apache FancyIndexing ?C=M (etc.) strings into your exclude_urls patterns. Otherwise, you're getting a slew of almost-duplicate documents.
As for your infinite loop, it does happen. That's why there's work on duplicate detection in 3.2 as well as the max_hop_count attribute:
http://www.htdig.org/attrs.html#max_hop_count
-- -Geoff Hutchison Williams Students Online http://wso.williams.edu/
-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger for complex code. Debugging C/C++ programs can leave you feeling lost and disoriented. TotalView can help you find your way. Available on major UNIX and Linux platforms. Try it free. www.etnus.com
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

