According to Harrell, Roger:
> Ok thanks for the clarification on the - records. That makes sense. And the
> results make sense as IDs 0 and 3 are both the index file which is the one
> doc it should be indexing. You are correct in your assumtion about the
> start_url and limit_urls.
>
> There are words in Doc 0 and 3 (again same document). When I run with -vvvv
> it is definately reading the index file. It read a total of 23183 bytes. I
> can't see where it lists what words it finds.
With at least 4 v options, htdig should spit out debugging output like...
word: This@821
word: page@829
word: tests@836
word: the@846
word: htdig@852
word: program@862
However, the presence of i:0 records in db.wordlist confirms the problem
is not in htdig, but apparently in htmerge. What platform are you running
this on?
> It rejects a lot of links
> because they are not in the limits which is expected. It displays the title
> of the index file so it's definately reading it. There are a bunch of
> records in db.wordlist that contain i:0. No robots tags in the docs. I don't
> see anything in htmerge -v output about IDs 0&3. Here's what I meant before
> about the format of the htmerge -v output.
>
> Section:
> htmerge: Sorting...
> htmerge: Merging...
> htmerge: 100:aargamle
> htmerge: 200:abitub
> htmerge: 300:acercame
> htmerge: 400:adina
> htmerge: 500:aeroporto
> htmerge: 600:ahilud
> htmerge: 700:alexandria
> htmerge: 800:alterna
>
> Later:
> htmerge: 10
> htmerge: 20
> htmerge: 30
> htmerge: 40
> htmerge: 50
> htmerge: 60
> htmerge: 70
> htmerge: 80
> htmerge: 90
> htmerge: 100
> htmerge: 110
> htmerge: 120
> htmerge: 130
>
> That's all that htmerge -v output looks like. There's nothing more to it.
What output do you get from htmerge -d -s? What are your settings for
database_dir, database_base and word_db? What files do you have in your
database_dir? Does htmerge create a db.docs.index there, or do you still
just have the db.docdb & db.wordlist?
> >The -1 and -2 are to delete document IDs 1 and 2 (members/email and
> >members/getemail), which is to be expected, as pages that are not found
> >or redirected are supposed to be deleted. I'd expect you also had -4,
> >-5, -6 & -7 records too. However, document IDs 0 & 3 should remain.
> >There are no "-" records in db.wordlist after htmerge runs, because it
> >strips out these records after acting on them.
>
> >I assume your start_url is set to
> http://www.audiblefaith.com/staging/members/
> >and limit_urls_to is set to the default of ${start_url}, so htdig will
> >attempt to index everything under your staging/members/ directory for
> >which it finds a link. The "not found" messages are there because it's
> >actually attempting to follow the links, but they don't lead to working
> >pages.
>
> >The line of -, + and * for documents 0 & 3 are caused by links that it's
> >finding in the documents (see FAQ 5.26). However, is htdig finding any
> >words in these documents? Are there any records in db.wordlist that
> >contain "i:0"? If you run htdig -vvvv, do you see lists of words that
> >it indexes? Do these documents contain any meta robots tags that would
> >allow following links but not indexing words (see FAQ 4.15)? What does
> >the htmerge -v output say about document IDs 0 & 3?
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html