Hi all: I'm trying to figure out how to read my output from htdig. I can't see any errors, but the content isn't getting crawled.
start_url: http://careermatters.tvo.org \ http://careermatters.tvo.org/highschool/show_groups.phtml \ http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml The last URL isn't getting crawled, but I can't find any error messages in the output. Here's what seems relevant from the log file, but I'm not even sure if I'm missing the important stuff. Could someone point me to a page that explains what's in the -vvv log files? I've checked the FAQ 5.26 -- there's not a "-" for rejected URL. Any help would be MUCH appreciated! emma First time the 3rd start_url is mentioned: 1:1:http://careermatters.tvo.org/highschool/show_groups.phtml pushed 1:1:http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml pushed pick: careermatters.tvo.org, # servers = 1 <snipped a bunch of stuff> pick: careermatters.tvo.org, # servers = 1 2:31:0:http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml: Retrieval command for http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml: GET /afterhs/apprenticeship/college.phtml HTTP/1.0 User-Agent: htdig/3.1.6-011302 ([EMAIL PROTECTED]) Authorization: Basic Y2FyZWVyOm1hdHRlcnM= Host: careermatters.tvo.org Header line: HTTP/1.0 200 OK Header line: Date: Wed, 06 Mar 2002 19:59:49 GMT Header line: Server: Apache/1.3.22 (Darwin) PHP/4.0.6 mod_perl/1.26 Header line: Cache-Control: max-age=60 Header line: Expires: Wed, 06 Mar 2002 20:00:49 GMT Header line: X-Powered-By: PHP/4.0.6 Header line: Content-Type: text/html Header line: X-Cache: MISS from idefix Header line: Connection: close Header line: returnStatus = 0 Read 8192 from document Read 726 from document Read a total of 8918 bytes title: College Apprenticeship Programs: CareerMATTERS href: http://careermatters.tvo.org/careermatters.css () Rejected: Extension is invalid! url rejected: (level 1)http://careermatters.tvo.org/careermatters.css <snipped a bunch of images> anchor: content image: http://careermatters.tvo.org/images/afterhs_banner.jpg size = 8918 pick: careermatters.tvo.org, # servers = 1 3:1:1:http://careermatters.tvo.org/highschool/index.phtml: Retrieval command for _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

