Ok thanks for the clarification on the - records. That makes sense. And the
results make sense as IDs 0 and 3 are both the index file which is the one
doc it should be indexing. You are correct in your assumtion about the
start_url and limit_urls.
There are words in Doc 0 and 3 (again same document). When I run with -vvvv
it is definately reading the index file. It read a total of 23183 bytes. I
can't see where it lists what words it finds. It rejects a lot of links
because they are not in the limits which is expected. It displays the title
of the index file so it's definately reading it. There are a bunch of
records in db.wordlist that contain i:0. No robots tags in the docs. I don't
see anything in htmerge -v output about IDs 0&3. Here's what I meant before
about the format of the htmerge -v output.
Section:
htmerge: Sorting...
htmerge: Merging...
htmerge: 100:aargamle
htmerge: 200:abitub
htmerge: 300:acercame
htmerge: 400:adina
htmerge: 500:aeroporto
htmerge: 600:ahilud
htmerge: 700:alexandria
htmerge: 800:alterna
Later:
htmerge: 10
htmerge: 20
htmerge: 30
htmerge: 40
htmerge: 50
htmerge: 60
htmerge: 70
htmerge: 80
htmerge: 90
htmerge: 100
htmerge: 110
htmerge: 120
htmerge: 130
That's all that htmerge -v output looks like. There's nothing more to it.
Roger
>The -1 and -2 are to delete document IDs 1 and 2 (members/email and
>members/getemail), which is to be expected, as pages that are not found
>or redirected are supposed to be deleted. I'd expect you also had -4,
>-5, -6 & -7 records too. However, document IDs 0 & 3 should remain.
>There are no "-" records in db.wordlist after htmerge runs, because it
>strips out these records after acting on them.
>I assume your start_url is set to
http://www.audiblefaith.com/staging/members/
>and limit_urls_to is set to the default of ${start_url}, so htdig will
>attempt to index everything under your staging/members/ directory for
>which it finds a link. The "not found" messages are there because it's
>actually attempting to follow the links, but they don't lead to working
>pages.
>The line of -, + and * for documents 0 & 3 are caused by links that it's
>finding in the documents (see FAQ 5.26). However, is htdig finding any
>words in these documents? Are there any records in db.wordlist that
>contain "i:0"? If you run htdig -vvvv, do you see lists of words that
>it indexes? Do these documents contain any meta robots tags that would
>allow following links but not indexing words (see FAQ 4.15)? What does
>the htmerge -v output say about document IDs 0 & 3?
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html