Thanks for answering, Geoff!!

>[-vvv output is] intended to be self-explanatory.

Which is unfortunate because there aren't any errors that I can find in the 
output, and the page itself is "fine" ... I've even tried setting this URL 
as the only start_url (I posted earlier about potential problems with line 
wrapping being the reason for missing content on pages). But even with only 
this questionable URL as the only start_url none of the content on the page 
gets indexed, and none of the links get followed.

>>Authorization: Basic xxxxxxx
>
>You probably didn't want to post that to a mailing list. It's encrypted, 
>but not particularly rigorously.

You're right I probably shouldn't have posted it. It's a fairly easy u:p to 
guess as the site doesn't need to be super secret, just a little bit 
secret. ...

>>title: College Apprenticeship Programs: CareerMATTERS
>><snipped a bunch of images>
>
>OK, but what exactly is on the page? It certainly didn't find anything 
>significant to index or links other than the images you pointed out.

The page has four bread crumb items, a bunch of image navigation buttons, 
eight left nav text links, and over 20 text links (in a list). None of the 
words on the page are getting put into the word db. i.e. the page has a 
list of Colleges and none of the names of the colleges show up when I do a 
search.

>Either the HTML parser is missing a lot, or there isn't much on the page 
>to index.

I think it's the first option, which scares me. :(

emma


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to