Ok thanks for the clarification on the - records. That makes sense. And the
results make sense as IDs 0 and 3 are both the index file which is the one
doc it should be indexing. You are correct in your assumtion about the
start_url and limit_urls. 

There are words in Doc 0 and 3 (again same document). When I run with -vvvv
it is definately reading the index file. It read a total of 23183 bytes. I
can't see where it lists what words it finds. It rejects a lot of links
because they are not in the limits which is expected. It displays the title
of the index file so it's definately reading it. There are a bunch of
records in db.wordlist that contain i:0. No robots tags in the docs. I don't
see anything in htmerge -v output about IDs 0&3. Here's what I meant before
about the format of the htmerge -v output.

Section:
htmerge: Sorting...
htmerge: Merging...
htmerge: 100:aargamle              
htmerge: 200:abitub              
htmerge: 300:acercame              
htmerge: 400:adina              
htmerge: 500:aeroporto              
htmerge: 600:ahilud              
htmerge: 700:alexandria              
htmerge: 800:alterna  

Later:
htmerge: 10
htmerge: 20
htmerge: 30
htmerge: 40
htmerge: 50
htmerge: 60
htmerge: 70
htmerge: 80
htmerge: 90
htmerge: 100
htmerge: 110
htmerge: 120
htmerge: 130

That's all that htmerge -v output looks like. There's nothing more to it.

Roger






>The -1 and -2 are to delete document IDs 1 and 2 (members/email and
>members/getemail), which is to be expected, as pages that are not found
>or redirected are supposed to be deleted.  I'd expect you also had -4,
>-5, -6 & -7 records too.  However, document IDs 0 & 3 should remain.
>There are no "-" records in db.wordlist after htmerge runs, because it
>strips out these records after acting on them.

>I assume your start_url is set to
http://www.audiblefaith.com/staging/members/
>and limit_urls_to is set to the default of ${start_url}, so htdig will
>attempt to index everything under your staging/members/ directory for
>which it finds a link.  The "not found" messages are there because it's
>actually attempting to follow the links, but they don't lead to working
>pages.

>The line of -, + and * for documents 0 & 3 are caused by links that it's
>finding in the documents (see FAQ 5.26).  However, is htdig finding any
>words in these documents?  Are there any records in db.wordlist that
>contain "i:0"?  If you run htdig -vvvv, do you see lists of words that
>it indexes?  Do these documents contain any meta robots tags that would
>allow following links but not indexing words (see FAQ 4.15)?  What does
>the htmerge -v output say about document IDs 0 & 3?

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to