// sequenceFile -> vector mahout seq2sparse -i ../temp/input -o ../temp/vector/ -chunk 100 -wt TFIDF -ow
// vector -> canopy mahoutcanopy -i /home/hduser/temp/vector/vector -o /home/hduser/temp/canopy/ -dm org.apache.mahout.common.distance.CosineDistanceMeasure -t1 0.032 -t2 0.008 -ow // canopy -> kmeans KMeansDriver.run( conf, // configuration vectorPath, // the directory pathname for input points canopyClusterPath, // the directory pathname for initial & computed clusters kmeansPath, // the directory pathname for output points new CosineDistanceMeasure(), // cos 0.1d, // the convergence delta value 10, // the maximum number of iterations true, // run clustering false // execute map reduce ); no exception thrown and thx in advance At 2011-10-12 20:27:19,"Grant Ingersoll" <[email protected]> wrote: >Can you share your actual commands? > >On Oct 12, 2011, at 6:21 AM, beneo_7 wrote: > >> hi all >> i create vector using lucene index, and the mahout will use NamedVector, >> but how about create vector from sequenceFile??? >> >> now, i create vector from text with the follow steps: >> >> step #1 >> text -> sequeneceFile >> key = text, value = text >> i do not use seqdirectory, cuz i want to put the String key into >> the sequenceFile, not the doc Id >> >> step #2 >> seq2sparse using TFIDF >> the output i use tfidf-vectors/ >> >> step #3 #4 >> canopy -> kmeans >> >> step #4 >> clusterDump >> >> i found the vector is >> org.apache.mahout.math.RandomAccessSparseVector, and where i can found the >> sequenceFile key?? >> >> thx in advance > >-------------------------------------------- >Grant Ingersoll >http://www.lucidimagination.com >Lucene Eurocon 2011: http://www.lucene-eurocon.com >
