Still no dice.
On Jun 26, 2009, at 7:59 PM, Grant Ingersoll wrote:
We need to make that handled separately then from the various jobs.
That was one of the things that was different about the KMeansJob
call.
On Jun 26, 2009, at 7:45 PM, Jeff Eastman wrote:
Found the call in the syntheticcontrol/kmeans.Job had true for the
overwrite output flag. Don't think that was your problem, but
something similar must be at work.
Jeff Eastman wrote:
Running the latest trunk, I get a file not found exception running
synthetic control on the $output/data file. Looks like output got
deleted somewhere but have not discovered where yet. Perhaps
Canopy is broken or KMeans is purging output?
Grant Ingersoll wrote:
I'm running trunk. Using the data at http://people.apache.org/wikipedia/n2.tar.gz
(a dump of 2302 documents from a Lucene index of Wikipedia. The
chunks file in that same directory contains the original files).
Vectors are normalized using L2.
When I run K-Means on it via:
org.apache.mahout.clustering.kmeans.KMeansDriver --input /Users/
grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/part-
full.txt --clusters /Users/grantingersoll/projects/lucene/solr/
wikipedia/devWorks/n2/clusters --k 10 --output /Users/
grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/k-
output --distance org.apache.mahout.utils.CosineDistanceMeasure
I get the two directories seen in n2-output. The clusters-0 and
clusters-1 files both contain a single vector which is all 0.
I've also tried SquaredEuclidean, but to no avail.
Any insight into what I'm doing wrong would be appreciated.
Thanks,
Grant
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search