Bill Akins wrote: > > All, > > After running the following script, I can only find a very few > documents... > > /usr/bin/htdig -v -a -s -u user:password > /usr/bin/htmerge -v -a -s > cp /var/lib/htdig/db.docdb.work /var/lib/htdig/db.docdb > mv /var/lib/htdig/db.docs.index.work /var/lib/htdig/db.docs.index > mv /var/lib/htdig/db.words.db.work /var/lib/htdig/db.words.db
This is all very well, but what does it say in the output? Does it look like it is pushing files (i.e. loading them into the DB)? Increase the debug level to check (i.e. -vv or -vvv). Are you getting a lot of "URL rejected" messages? Do they look sensible? The important thing to check is that your start_url is well-chosen and that htdig is suceeding to crawl down through your site. Also check your limit_normalized and limit_urls_to directives. > > Am I doing something wrong here? > > File sizes: > 1522201600 db.docdb > 1522201600 db.docdb.work > 6144 db.docs.index <==== Isn't this way too small for 500,000+ > documents?! > 2072429267 db.wordlist > 1830162432 db.words.db > This looks pretty odd... I would junk the whole lot and start again. If you run htdig with the "-i" option it will regenerate the DB every time - you could do this until you are sure it is working properly. Rgds, Owen Boyle _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

