According to Ace: > I suppose the bad words list contains the most often used words of the > language. Is it imaginable that htdig indexes all files to be indexed > and finds out the most often used words and prints them out, so I could > decide which words I want to exclude from the index to speed up searching?
htdig doesn't do this directly, but it could be done pretty easily by analysing the db.wordlist file in 3.1.x, or running htdump and analysing the db.worddump file in 3.2.x. Either way, you could write a simple awk or Perl script that would total up the word counts. > Would it help if I told you that the university of Leipzig has published > word lists containing the 100, 1000 and 10000 most often used words of > english, german, french and dutch at > http://woclu2.informatik.uni-leipzig.de/html/wliste.html - no copyrights > and no restrictions seem to be applied to the downloadable files? Danke shoen! I've added this tip to FAQ 4.6. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

