Hi there!

I suppose the bad words list contains the most often used words of the 
language. Is it imaginable that htdig indexes all files to be indexed 
and finds out the most often used words and prints them out, so I could 
decide which words I want to exclude from the index to speed up searching?

Would it help if I told you that the university of Leipzig has published 
word lists containing the 100, 1000 and 10000 most often used words of 
english, german, french and dutch at 
http://woclu2.informatik.uni-leipzig.de/html/wliste.html - no copyrights 
and no restrictions seem to be applied to the downloadable files?

Peter Asemann


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to