According to Harrell, Roger:
> I checked the db.wordlist and found a -1 and a -2 record. This is the output
> from htdig -v 
> 
> 0:0:0:http://www.audiblefaith.com/staging/members/: ---++-----+-++++-----
> size = 23184
> 1:1:1:http://www.audiblefaith.com/staging/members/email:  not found
> 2:2:1:http://www.audiblefaith.com/staging/members/getemail:  redirect
> 3:3:1:http://www.audiblefaith.com/staging/members/index:
> ---**------****----- size = 23184
> 4:4:1:http://www.audiblefaith.com/staging/members/forums/:  not found
> 5:5:1:http://www.audiblefaith.com/staging/members/calendar/:  not found
> 6:6:1:http://www.audiblefaith.com/staging/members/biblesrc:  redirect
> 7:7:1:http://www.audiblefaith.com/staging/members/askbob/:  not found
> 
> Now when I ran it this time there are no "-" records in the htmerge -v
> output. Not sure why it would be different I didn't change anything. I am
> running this to index only one page right now. It's totally acceptable that
> it's not following most of the links, but I don't understand why it wouldn't
> index the one page. 
> 
> Roger Harrell
> 
> >I don't know what "htmerge: ####" is supposed to mean.  Try htmerge with
> >-v and -s options, or even with -vv, and look carefully for anything
> >suspicious.  After htdig runs, does db.wordlist contain only word records,
> >or is there a "-" record in it too?  If there is, htdig is telling htmerge
> >to remove that document ID, so you'd need to look at the htdig -v output
> >to find out why.

The -1 and -2 are to delete document IDs 1 and 2 (members/email and
members/getemail), which is to be expected, as pages that are not found
or redirected are supposed to be deleted.  I'd expect you also had -4,
-5, -6 & -7 records too.  However, document IDs 0 & 3 should remain.
There are no "-" records in db.wordlist after htmerge runs, because it
strips out these records after acting on them.

I assume your start_url is set to http://www.audiblefaith.com/staging/members/
and limit_urls_to is set to the default of ${start_url}, so htdig will
attempt to index everything under your staging/members/ directory for
which it finds a link.  The "not found" messages are there because it's
actually attempting to follow the links, but they don't lead to working
pages.

The line of -, + and * for documents 0 & 3 are caused by links that it's
finding in the documents (see FAQ 5.26).  However, is htdig finding any
words in these documents?  Are there any records in db.wordlist that
contain "i:0"?  If you run htdig -vvvv, do you see lists of words that
it indexes?  Do these documents contain any meta robots tags that would
allow following links but not indexing words (see FAQ 4.15)?  What does
the htmerge -v output say about document IDs 0 & 3?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to