I'm running trunk. Using the data at http://people.apache.org/wikipedia/n2.tar.gz (a dump of 2302 documents from a Lucene index of Wikipedia. The chunks file in that same directory contains the original files). Vectors are normalized using L2.

When I run K-Means on it via: org.apache.mahout.clustering.kmeans.KMeansDriver --input /Users/ grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/part- full.txt --clusters /Users/grantingersoll/projects/lucene/solr/ wikipedia/devWorks/n2/clusters --k 10 --output /Users/grantingersoll/ projects/lucene/solr/wikipedia/devWorks/n2/k-output --distance org.apache.mahout.utils.CosineDistanceMeasure

I get the two directories seen in n2-output. The clusters-0 and clusters-1 files both contain a single vector which is all 0.

I've also tried SquaredEuclidean, but to no avail.

Any insight into what I'm doing wrong would be appreciated.

Thanks,
Grant

Reply via email to