On Thu, 20 May 2004, Sjan Evardsson wrote: > I understand that when I see the "url rejected" in the output, that > means it is in the excluded list. I have excluded /calendar/ from htdig,
Are you seeing "url rejected" messages associated with URLs that contain /calendar/ ? Or are you referring to the message in a more general sense? > however, it seems to find it necessarry to repeatedly hit > /calendar/index.php?page=XX&day=XX&month=XX&year=XX (where XX are > repeadetly replaced with increasing values, so far from 2001 to 2007). If by "hit" you mean that the page is actually being requested and retrieved, then I believe something is wrong with your configuration, or perhaps with your databases. If there is a match between a URL and an 'exclude_url' pattern, the expected behavior is that htdig will not request the page. > Is this because it is finding those links somewhere or is it just doing > a zombie-like, brain-dead, recursive search on the url which is then > rejected? If htdig is trying to follow a link, it is finding it somewhere. Either it is extracting it from some other valid document or it is somehow being explicitly passed to htdig (e.g. start_url). The behavior you are describing is what I would expect if /calendar/ was not being correctly added as an 'exclude_url' pattern. If the calendars are of the type that I suspect they are, each one has links to every day of the month along with links that move forward and backward to different months (and years). Without something to exclude these URLs, the calendar script will continue to generate and serve up calendar pages indefinitely. Jim ------------------------------------------------------- This SF.Net email is sponsored by: Oracle 10g Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

