On Thu, 20 May 2004, Sjan Evardsson wrote:

> I understand that when I see the "url rejected" in the output, that
> means it is in the excluded list. I have excluded /calendar/ from htdig,

Are you seeing "url rejected" messages associated with URLs that contain
/calendar/ ? Or are you referring to the message in a more general sense?

> however, it seems to find it necessarry to repeatedly hit
> /calendar/index.php?page=XX&day=XX&month=XX&year=XX (where XX are
> repeadetly replaced with increasing values, so far from 2001 to 2007).

If by "hit" you mean that the page is actually being requested and
retrieved, then I believe something is wrong with your configuration, or
perhaps with your databases. If there is a match between a URL and
an 'exclude_url' pattern, the expected behavior is that htdig will not
request the page.

> Is this because it is finding those links somewhere or is it just doing
> a zombie-like, brain-dead, recursive search on the url which is then
> rejected?

If htdig is trying to follow a link, it is finding it somewhere. Either
it is extracting it from some other valid document or it is somehow being
explicitly passed to htdig (e.g. start_url).

The behavior you are describing is what I would expect if /calendar/ was
not being correctly added as an 'exclude_url' pattern. If the calendars
are of the type that I suspect they are, each one has links to every day
of the month along with links that move forward and backward to different
months (and years). Without something to exclude these URLs, the calendar
script will continue to generate and serve up calendar pages indefinitely.

Jim


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to