The minimum word length and stop word list are run-time configurable. The exclusion of words that are in more than 50% of the corpus is a compile-time issue (or simply use boolean). Here are the settings to be aware of:

ft_min_word_len=3
ft_stopword_file=/dev/null

--Casey

http://about.scriblio.net/
http://maisonbisson.com/


On Jun 1, 2009, at 11:13 AM, Mike Taylor wrote:

However, all of these oddities -- over eager stop-list, ignoring short
words, not counting words in more than half the rows -- can be sorted
out by configuration options.

Reply via email to