thx

At 2011-10-12 21:51:51,"Grant Ingersoll" <[email protected]> wrote:
>
>On Oct 12, 2011, at 9:26 AM, beneo_7 wrote:
>
>> 
>> 
>> // sequenceFile -> vector
>> mahout seq2sparse -i ../temp/input -o ../temp/vector/ -chunk 100 -wt TFIDF 
>> -ow
>
>I think you need the --namedVector option to get/keep named vectors.  You 
>might try using the SequenceFile dumper (seqdumper) to examine the output of 
>this.
>
>(Also, in the future, this question is best asked on [email protected])
>
>> 
>> 
>> // vector -> canopy
>> mahoutcanopy -i /home/hduser/temp/vector/vector -o /home/hduser/temp/canopy/ 
>> -dm org.apache.mahout.common.distance.CosineDistanceMeasure -t1 0.032 -t2 
>> 0.008 -ow 
>> 
>> 
>> 
>> 
>> // canopy -> kmeans
>> KMeansDriver.run( conf, // configuration vectorPath, // the directory 
>> pathname for input points canopyClusterPath, // the directory pathname for 
>> initial & computed clusters kmeansPath, // the directory pathname for output 
>> points new CosineDistanceMeasure(), // cos 0.1d, // the convergence delta 
>> value 10, // the maximum number of iterations true, // run clustering false 
>> // execute map reduce );
>> 
>> 
>> 
>> 
>> no exception  thrown and thx in advance
>> 
>> 
>> 
>> 
>> At 2011-10-12 20:27:19,"Grant Ingersoll" <[email protected]> wrote:
>>> Can you share your actual commands?
>>> 
>>> On Oct 12, 2011, at 6:21 AM, beneo_7 wrote:
>>> 
>>>> hi all
>>>>   i create vector using lucene index, and the mahout will use NamedVector, 
>>>> but how about create vector from sequenceFile???
>>>> 
>>>>   now, i create vector from text with the follow steps:
>>>> 
>>>>   step #1
>>>>       text -> sequeneceFile
>>>>           key = text, value = text
>>>>           i do not use seqdirectory, cuz i want to put the String key into 
>>>> the sequenceFile, not the doc Id
>>>> 
>>>>   step #2
>>>>       seq2sparse using TFIDF
>>>>           the output i use tfidf-vectors/
>>>> 
>>>>   step #3 #4
>>>>       canopy -> kmeans
>>>> 
>>>>   step #4
>>>>       clusterDump
>>>> 
>>>>       i found the vector is 
>>>> org.apache.mahout.math.RandomAccessSparseVector, and where i can found the 
>>>> sequenceFile key??
>>>> 
>>>>   thx in advance
>>> 
>>> --------------------------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com
>>> Lucene Eurocon 2011: http://www.lucene-eurocon.com
>>> 
>
>--------------------------------------------
>Grant Ingersoll
>http://www.lucidimagination.com
>Lucene Eurocon 2011: http://www.lucene-eurocon.com
>

Reply via email to