Hi all:

I'm trying to figure out how to read my output from htdig. I can't see any 
errors, but the content isn't getting crawled.

start_url: http://careermatters.tvo.org \
http://careermatters.tvo.org/highschool/show_groups.phtml \
http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml

The last URL isn't getting crawled, but I can't find any error messages in 
the output. Here's what seems relevant from the log file, but I'm not even 
sure if I'm missing the important stuff. Could someone point me to a page 
that explains what's in the -vvv log files? I've checked the FAQ 5.26 -- 
there's not a "-" for rejected URL.

Any help would be MUCH appreciated!

emma


First time the 3rd start_url is mentioned:
        1:1:http://careermatters.tvo.org/highschool/show_groups.phtml pushed
        1:1:http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml pushed
pick: careermatters.tvo.org, # servers = 1



<snipped a bunch of stuff>

pick: careermatters.tvo.org, # servers = 1
2:31:0:http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml: 
Retrieval command for 
http://careermatters.tvo.org/afterhs/apprenticeship/college.phtml: GET 
/afterhs/apprenticeship/college.phtml HTTP/1.0

User-Agent: htdig/3.1.6-011302 ([EMAIL PROTECTED])

Authorization: Basic Y2FyZWVyOm1hdHRlcnM=

Host: careermatters.tvo.org
Header line: HTTP/1.0 200 OK
Header line: Date: Wed, 06 Mar 2002 19:59:49 GMT
Header line: Server: Apache/1.3.22 (Darwin) PHP/4.0.6 mod_perl/1.26
Header line: Cache-Control: max-age=60
Header line: Expires: Wed, 06 Mar 2002 20:00:49 GMT
Header line: X-Powered-By: PHP/4.0.6
Header line: Content-Type: text/html
Header line: X-Cache: MISS from idefix
Header line: Connection: close
Header line:
returnStatus = 0
Read 8192 from document
Read 726 from document
Read a total of 8918 bytes

title: College Apprenticeship Programs: CareerMATTERS
href: http://careermatters.tvo.org/careermatters.css ()

    Rejected: Extension is invalid!
url rejected: (level 1)http://careermatters.tvo.org/careermatters.css
<snipped a bunch of images>
anchor: content
image: http://careermatters.tvo.org/images/afterhs_banner.jpg
  size = 8918
pick: careermatters.tvo.org, # servers = 1
3:1:1:http://careermatters.tvo.org/highschool/index.phtml: Retrieval 
command for 


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to