>Getting the dead pages is easy; in the log they're marked "not found".
>Getting their sources is a little harder, but with V3.1.2 all you had to
Actually, it's a *lot* easier than this. Use the -s flag. At the end
of the dig, it will print the broken URLs and their referers. There's
even a contributed script in the archive that will help you do
various things with the list.
>Also: is there any documentation for the format of the log file? what are
>the three numbers at the beginning of the line, e.g.
>
> 14:2:0:<url>: not found
Index #, DocID, Hopcount
where Index # is incremented every step during that indexing run,
DocID is the internal database ID #, and hopcount is the number of
hops from the start_url.
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.