Despite my efforts to reduce the size of my htdig index
files, they seem kind of big, so I thought I would ask if
they are out of line. These are all HTML files on a single
website. Here are my statistics:
Total documents: 59,546
Total doc size: 482 MB
db.docdb: 36 MB
db.words.db 227 MB
db.docs.index 5 MB
The index files are 56% the size of the doc collection.
Is this unusual?
And this is after trying to reduce the
index size by:
- adding a 700 word bad_word_list.
- setting max_head_length to only 50
- adding 18 of the most common entries for common_url_parts
I think it is marvelous that htdig handled this much
content. I'm just wondering if I'm missing something
that might reduce the size of the index files.
bobs
Bob Stayton 400 Encinal Street
Publications Architect Santa Cruz, CA 95060
Technical Publications voice: (831) 427-7796
Caldera International, Inc. fax: (831) 429-1887
email: [EMAIL PROTECTED]
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html