Hi All,


We're running htdig 3.1.6 under FreeBSD 4.9. We index a number of local sites, which we do by using:

"
start_url:              `/usr/local/htdig/url_list.txt`
"

and,

"
limit_urls_to:  ${start_url}
"

In the htdig.conf

'url_list.txt' is in the format:

"
http://www.somewhere.com
http://www.anothersite.com
http://www.yetanothersite.com
http://www.yougettheidea.com
"

This has worked for quite a while - but recently, we caught htdig taking hours, and merrily wandering around the web - apparently fetching anything it could find.

Carefully aborting the current dig, removing the temporary files - and re-running the dig with more verbose logging we see output similar to:

"
url rejected: (level 1)http://home.microsoft.com/
A tag: pos = 2, position = ="http://home.netscape.com/";>

url rejected: (level 1)http://home.netscape.com/
"

Which is good, because neither of those sites are in our url list - obviously someone linked from their site in our list to those, and htdig rightly decided it wasn't going to go there :)

But then, we see stuff like:

"
A tag: pos = 16, position = ="http://u.extreme-dm.com/?login=zzq9w8sak";>

pushing http://u.extreme-dm.com/?login=zzq9w8sak

New server: u.extreme-dm.com, 80
"

Which looks like it's decided to go index u.extreme-dm.com?

Later on, we get:

"
374:1327:1:http://u.extreme-dm.com/?login=zzq9w8sak:  not found
"

So, obviously that login isn't valid any more - but why did htdig try to fetch the site? - It's not in our url_list.txt - and it shouldn't appear in the ${start_url} either should it?

Eventually this happens for some other 'off site' site - which has a whole page of links to other sites, and htdig will merrily go off and try to index them - even though they bare no relation to any urls in url_list.txt, or you would have thought in ${start_url}?

Any help appreciated...

-Karl


------------------------------------------------------- This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek For a limited time only, get FREE Ground shipping on all orders of $35 or more. Hurry up and shop folks, this offer expires April 30th! http://www.thinkgeek.com/freeshipping/?cpg=12297 _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to