I am trying to index a website using htdig and I am having a hard time understanding why some of my links are being followed and others aren't.
The site that I am trying to index is http://www.law.upenn.edu/ Links on the front page are followed properly. One of those links leads to http://www.law.upenn.edu/departments/, which htdig "pushes" and then requests. htdig then fails to follow the links in that second document but I can't figure out why -- it doesn't seem to be rejecting them, just silently ignoring them. I have increased htdig's verbose output to -vvv and have posted two segments of the generated log here: http://faculty.law.upenn.edu/~mwsnyder/log1.txt http://faculty.law.upenn.edu/~mwsnyder/log2.txt I am running htdig-3.1.6. These are the possibly relevant config options: database_dir: /usr/local/htdig/db start_url: http://www.law.upenn.edu/ limit_urls_to: ${start_url} exclude_urls: /cgi-bin/ .cgi /bll/ulc/ bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \ .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi .css .pdf max_head_length: 10000 max_doc_size: 200000 no_excerpt_show_top: true search_algorithm: exact:1 synonyms:0.5 endings:0.1 Can anyone tell me how to convince htdig to follow the links within http://www.law.upenn.edu/departments ? Thanks. -- Matthew Snyder University of Pennsylvania Law School ------------------------------------------------------- This SF.NET email is sponsored by: eBay Great deals on office technology -- on eBay now! Click here: http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

