Seq2Sparse maps to org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles in the core source branch.
Daniel. On Mon, May 2, 2011 at 12:44 PM, Dipti Mathur <[email protected]> wrote: > Hi All, > > I am trying to build a classifier for a set of data that I have collected > myself. I am very new to mahout and would be very grateful if someone could > help me with the steps to get started. > > The documents I have come across so far explain how to run the sample codes > but when I tried converting my text to vectors ( using seqdirectory and > seq2sparse) and run the kmeans algorithm, I get errors like below. I am not > even able to find the source code to "kmeans" or "seq2sparse" executables to > begin fixing the issue. Pointers to good reads will also help. Any help at > all will be greatly appreciated. > > dipti@dipti-laptop:~$ mahout kmeans -i seq-output2 -c temp -o cluster-output > -k 20 -cd 0.01 -x 20 > Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20.2/ > HADOOP_CONF_DIR=/usr/lib/hadoop-0.20.2/conf > 11/05/02 22:02:55 INFO common.AbstractJob: Command line arguments: > {--clusters=temp, --convergenceDelta=0.01, > --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, > --endPhase=2147483647, --input=seq-output2, --maxIter=20, > --method=mapreduce, --numClusters=20, --output=cluster-output, > --startPhase=0, --tempDir=temp} > 11/05/02 22:02:55 INFO common.HadoopUtil: Deleting temp > 11/05/02 22:02:55 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 11/05/02 22:02:55 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > 11/05/02 22:02:55 INFO compress.CodecPool: Got brand-new compressor > Exception in thread "main" java.lang.ClassCastException: class > org.apache.hadoop.io.IntWritable > at java.lang.Class.asSubclass(Class.java:3039) > at > org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:86) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:96) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:54) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > Regards, > Dipti Mathur >
