On Jan 10, 2010, at 11:18 AM, Grant Ingersoll wrote: > Continuing my sweep through Mahout's clustering capabilities... > > In LDA, one of the input parameters is --numWords. I think this is supposed > to be the total number of words seen in the collection, right? Thus, if I > dumped Vectors from Lucene, for instance, the --numWords value should be the > count of the number of values in the dictionary, right?
Answering my own question: Yes, the num words should be at least the size of the words in the dictionary.
