Bill Akins wrote:
> 
> All,
> 
> After running the following script, I can only find a very few
> documents...
> 
> /usr/bin/htdig -v -a -s -u user:password
> /usr/bin/htmerge -v -a -s
> cp /var/lib/htdig/db.docdb.work /var/lib/htdig/db.docdb
> mv /var/lib/htdig/db.docs.index.work /var/lib/htdig/db.docs.index
> mv /var/lib/htdig/db.words.db.work /var/lib/htdig/db.words.db

This is all very well, but what does it say in the output? Does it look
like it is pushing files (i.e. loading them into the DB)? Increase the
debug level to check (i.e. -vv or -vvv). Are you getting a lot of "URL
rejected" messages? Do they look sensible?

The important thing to check is that your start_url is well-chosen and
that htdig is suceeding to crawl down through your site. Also check your
limit_normalized and limit_urls_to directives.

> 
> Am I doing something wrong here?
> 
> File sizes:
> 1522201600 db.docdb
> 1522201600 db.docdb.work
> 6144 db.docs.index    <====  Isn't this way too small for 500,000+
> documents?!
> 2072429267 db.wordlist
> 1830162432 db.words.db
> 

This looks pretty odd... I would junk the whole lot and start again. If
you run htdig with the "-i" option it will regenerate the DB every time
- you could do this until you are sure it is working properly.

Rgds,

Owen Boyle

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to