Hi All,
We're running htdig 3.1.6 under FreeBSD 4.9. We index a number of local sites, which we do by using:
" start_url: `/usr/local/htdig/url_list.txt` "
and,
"
limit_urls_to: ${start_url}
"In the htdig.conf
'url_list.txt' is in the format:
" http://www.somewhere.com http://www.anothersite.com http://www.yetanothersite.com http://www.yougettheidea.com "
This has worked for quite a while - but recently, we caught htdig taking hours, and merrily wandering around the web - apparently fetching anything it could find.
Carefully aborting the current dig, removing the temporary files - and re-running the dig with more verbose logging we see output similar to:
" url rejected: (level 1)http://home.microsoft.com/ A tag: pos = 2, position = ="http://home.netscape.com/">
url rejected: (level 1)http://home.netscape.com/ "
Which is good, because neither of those sites are in our url list - obviously someone linked from their site in our list to those, and htdig rightly decided it wasn't going to go there :)
But then, we see stuff like:
" A tag: pos = 16, position = ="http://u.extreme-dm.com/?login=zzq9w8sak">
pushing http://u.extreme-dm.com/?login=zzq9w8sak
New server: u.extreme-dm.com, 80 "
Which looks like it's decided to go index u.extreme-dm.com?
Later on, we get:
" 374:1327:1:http://u.extreme-dm.com/?login=zzq9w8sak: not found "
So, obviously that login isn't valid any more - but why did htdig try to fetch the site? - It's not in our url_list.txt - and it shouldn't appear in the ${start_url} either should it?
Eventually this happens for some other 'off site' site - which has a whole page of links to other sites, and htdig will merrily go off and try to index them - even though they bare no relation to any urls in url_list.txt, or you would have thought in ${start_url}?
Any help appreciated...
-Karl
------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

