To apply statistical tools to the words For example, say you have a large collection of news articles and you want to know what words is appearing more often than usual today...
Then you could do a TermEnum limited to documents that were indexed today, then you can do term enums for the previous 10 days, to find a mean and a standard deviation for each of the words. Using this information you could find which word is the most standard deviations over it's mean appearance number for today, and get an idea of what words are relevant to active stories today. Or you wanted to see what words in your corpora of news articles were related to the word 'foo'... you could find the frequency for every word in the index only in documents which match some TermQuery (like "contents:foo") then compare these frequencies to the gross frequencies of every term in the index to find out how relevant every term in the index is compared to foo. On 12/27/05, Phoenix <[EMAIL PROTECTED]> wrote: > > why ? >
