Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: k-means-commandline
(https://cwiki.apache.org/confluence/display/MAHOUT/k-means-commandline)
Comment:
https://cwiki.apache.org/confluence/display/MAHOUT/k-means-commandline?focusedCommentId=27844105#comment-27844105
Comment added by Jeff Eastman:
---------------------------------------------------------------------
The line: "hdfs://RH01:9000/user/hadoop/testdata/synthetic_control.data not a
SequenceFile" in your transcript output indicates you are attempting to run
k-means on the synthetic control data file, which is a text file. If you look
at the synthetic control examples, you will note that they call
InputDriver.runJob(input, directoryContainingConvertedInput,
"org.apache.mahout.math.RandomAccessSparseVector");
on this file before invoking k-means on its sequence file output.
In reply to a comment by yexq:
[hadoop@RH01 ~]$ mahout kmeans -i testdata -o output -c clusters -dm
org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cd 1 -k 25
Running on hadoop, using HADOOP_HOME=/mnt/userspace/hadoop-0.20.2
HADOOP_CONF_DIR=/mnt/userspace/hadoop-0.20.2/conf
12/04/16 12:51:48 INFO common.AbstractJob: Command line arguments:
{--clusters=clusters, --convergenceDelta=1,
--distanceMeasure=org.apache.mahout.common.distance.CosineDistanceMeasure,
--endPhase=2147483647, --input=testdata, --maxIter=5, --method=mapreduce,
--numClusters=25, --output=output, --overwrite=null, --startPhase=0,
--tempDir=temp}
12/04/16 12:51:49 INFO common.HadoopUtil: Deleting output
12/04/16 12:51:49 INFO common.HadoopUtil: Deleting clusters
12/04/16 12:51:49 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/04/16 12:51:49 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
12/04/16 12:51:49 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.IllegalStateException:
java.io.IOException:
hdfs://RH01:9000/user/hadoop/testdata/synthetic_control.data not a SequenceFile
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
at
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:87)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException:
hdfs://RH01:9000/user/hadoop/testdata/synthetic_control.data not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1455)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
... 16 more
who can help me?
Change your notification preferences:
https://cwiki.apache.org/confluence/users/viewnotifications.action