You need to specify the Lucene analyzer that will be used to tokenize the text. 
 That being said, I thought there was a default.  What version of Mahout are 
you using?


On Sep 16, 2011, at 5:41 AM, Jack He wrote:

> I've tried commad below:
> mahout seqdirectory -i cluster/testdata -o cluster-seq -c UTF-8
> 
> the input file just like:
> 1 2 3 4 5
> 6 7 8 9 10
> 11 12 ...etc
> then, I've got a file named chunk-0 in the directory cluster-seq.it's almost
> the same with input file.
> 
> the next step, I ran the commad below:
> mahout seq2sparse -i cluster-seq -o cluster-seq-vec
> 
> but it didn't work.said:
> 
> Error: java.lang.ClassNotFoundException: org.apache.lucene.analysis.Analyzer
> 
> My English is poor, hope you can understand what I said. Thanks for your
> help
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-to-convert-a-text-file-to-vector-for-kmeans-tp3341486p3341486.html
> Sent from the Mahout Developer List mailing list archive at Nabble.com.

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com

Reply via email to