You need to specify the Lucene analyzer that will be used to tokenize the text. That being said, I thought there was a default. What version of Mahout are you using?
On Sep 16, 2011, at 5:41 AM, Jack He wrote: > I've tried commad below: > mahout seqdirectory -i cluster/testdata -o cluster-seq -c UTF-8 > > the input file just like: > 1 2 3 4 5 > 6 7 8 9 10 > 11 12 ...etc > then, I've got a file named chunk-0 in the directory cluster-seq.it's almost > the same with input file. > > the next step, I ran the commad below: > mahout seq2sparse -i cluster-seq -o cluster-seq-vec > > but it didn't work.said: > > Error: java.lang.ClassNotFoundException: org.apache.lucene.analysis.Analyzer > > My English is poor, hope you can understand what I said. Thanks for your > help > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-to-convert-a-text-file-to-vector-for-kmeans-tp3341486p3341486.html > Sent from the Mahout Developer List mailing list archive at Nabble.com. -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com
