// sequenceFile -> vector
mahout seq2sparse -i ../temp/input -o ../temp/vector/ -chunk 100 -wt TFIDF -ow


// vector -> canopy
mahoutcanopy -i /home/hduser/temp/vector/vector -o /home/hduser/temp/canopy/ 
-dm org.apache.mahout.common.distance.CosineDistanceMeasure -t1 0.032 -t2 0.008 
-ow 




// canopy -> kmeans
KMeansDriver.run( conf, // configuration vectorPath, // the directory pathname 
for input points canopyClusterPath, // the directory pathname for initial & 
computed clusters kmeansPath, // the directory pathname for output points new 
CosineDistanceMeasure(), // cos 0.1d, // the convergence delta value 10, // the 
maximum number of iterations true, // run clustering false // execute map 
reduce );




no exception  thrown and thx in advance




At 2011-10-12 20:27:19,"Grant Ingersoll" <[email protected]> wrote:
>Can you share your actual commands?
>
>On Oct 12, 2011, at 6:21 AM, beneo_7 wrote:
>
>> hi all
>>    i create vector using lucene index, and the mahout will use NamedVector, 
>> but how about create vector from sequenceFile???
>> 
>>    now, i create vector from text with the follow steps:
>> 
>>    step #1
>>        text -> sequeneceFile
>>            key = text, value = text
>>            i do not use seqdirectory, cuz i want to put the String key into 
>> the sequenceFile, not the doc Id
>> 
>>    step #2
>>        seq2sparse using TFIDF
>>            the output i use tfidf-vectors/
>> 
>>    step #3 #4
>>        canopy -> kmeans
>> 
>>    step #4
>>        clusterDump
>> 
>>        i found the vector is 
>> org.apache.mahout.math.RandomAccessSparseVector, and where i can found the 
>> sequenceFile key??
>> 
>>    thx in advance
>
>--------------------------------------------
>Grant Ingersoll
>http://www.lucidimagination.com
>Lucene Eurocon 2011: http://www.lucene-eurocon.com
>

Reply via email to