Re: [htdig] htmerge not creating db.words.db

Gilles Detillieux Mon, 04 Jun 2001 15:42:03 -0700
According to Harrell, Roger:
> Ok thanks for the clarification on the - records. That makes sense. And the
> results make sense as IDs 0 and 3 are both the index file which is the one
> doc it should be indexing. You are correct in your assumtion about the
> start_url and limit_urls. 
> 
> There are words in Doc 0 and 3 (again same document). When I run with -vvvv
> it is definately reading the index file. It read a total of 23183 bytes. I
> can't see where it lists what words it finds.

With at least 4 v options, htdig should spit out debugging output like...

word: This@821
word: page@829
word: tests@836
word: the@846
word: htdig@852
word: program@862

However, the presence of i:0 records in db.wordlist confirms the problem
is not in htdig, but apparently in htmerge.  What platform are you running
this on?

> It rejects a lot of links
> because they are not in the limits which is expected. It displays the title
> of the index file so it's definately reading it. There are a bunch of
> records in db.wordlist that contain i:0. No robots tags in the docs. I don't
> see anything in htmerge -v output about IDs 0&3. Here's what I meant before
> about the format of the htmerge -v output.
> 
> Section:
> htmerge: Sorting...
> htmerge: Merging...
> htmerge: 100:aargamle              
> htmerge: 200:abitub              
> htmerge: 300:acercame              
> htmerge: 400:adina              
> htmerge: 500:aeroporto              
> htmerge: 600:ahilud              
> htmerge: 700:alexandria              
> htmerge: 800:alterna  
> 
> Later:
> htmerge: 10
> htmerge: 20
> htmerge: 30
> htmerge: 40
> htmerge: 50
> htmerge: 60
> htmerge: 70
> htmerge: 80
> htmerge: 90
> htmerge: 100
> htmerge: 110
> htmerge: 120
> htmerge: 130
> 
> That's all that htmerge -v output looks like. There's nothing more to it.

What output do you get from htmerge -d -s?  What are your settings for
database_dir, database_base and word_db?  What files do you have in your
database_dir?  Does htmerge create a db.docs.index there, or do you still
just have the db.docdb & db.wordlist?

> >The -1 and -2 are to delete document IDs 1 and 2 (members/email and
> >members/getemail), which is to be expected, as pages that are not found
> >or redirected are supposed to be deleted.  I'd expect you also had -4,
> >-5, -6 & -7 records too.  However, document IDs 0 & 3 should remain.
> >There are no "-" records in db.wordlist after htmerge runs, because it
> >strips out these records after acting on them.
> 
> >I assume your start_url is set to
> http://www.audiblefaith.com/staging/members/
> >and limit_urls_to is set to the default of ${start_url}, so htdig will
> >attempt to index everything under your staging/members/ directory for
> >which it finds a link.  The "not found" messages are there because it's
> >actually attempting to follow the links, but they don't lead to working
> >pages.
> 
> >The line of -, + and * for documents 0 & 3 are caused by links that it's
> >finding in the documents (see FAQ 5.26).  However, is htdig finding any
> >words in these documents?  Are there any records in db.wordlist that
> >contain "i:0"?  If you run htdig -vvvv, do you see lists of words that
> >it indexes?  Do these documents contain any meta robots tags that would
> >allow following links but not indexing words (see FAQ 4.15)?  What does
> >the htmerge -v output say about document IDs 0 & 3?


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
Re: [htdig] htmerge not creating db.words.db

Reply via email to