Suneel Marthi created MAHOUT-1469:
-------------------------------------
Summary: Streaming KMeans fails when executed in MapReduce mode
and REDUCE_STREAMING_KMEANS is set to true
Key: MAHOUT-1469
URL: https://issues.apache.org/jira/browse/MAHOUT-1469
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.9
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Fix For: 1.0
Centroids are not being generated when executed in MR with -rskm flag set.
{Code}
14/03/20 02:42:12 INFO mapreduce.StreamingKMeansThread: Estimated Points: 282
14/03/20 02:42:12 INFO mapred.JobClient: map 100% reduce 0%
14/03/20 02:42:14 INFO mapreduce.StreamingKMeansReducer: Number of Centroids: 0
14/03/20 02:42:14 WARN mapred.LocalJobRunner: job_local1374896815_0001
java.lang.IllegalArgumentException: Must have nonzero number of training and
test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0]
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:148)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
at
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
at
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
14/03/20 02:42:14 INFO mapred.JobClient: Job complete: job_local1374896815_0001
14/03/20 02:42:14 INFO mapred.JobClient: Counters: 16
14/03/20 02:42:14 INFO mapred.JobClient: File Input Format Counters
14/03/20 02:42:14 INFO mapred.JobClient: Bytes Read=17156391
14/03/20 02:42:14 INFO mapred.JobClient: FileSystemCounters
14/03/20 02:42:14 INFO mapred.JobClient: FILE_BYTES_READ=41925624
14/03/20 02:42:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=25974741
14/03/20 02:42:14 INFO mapred.JobClient: Map-Reduce Framework
14/03/20 02:42:14 INFO mapred.JobClient: Map output materialized
bytes=956293
14/03/20 02:42:14 INFO mapred.JobClient: Map input records=21578
14/03/20 02:42:14 INFO mapred.JobClient: Reduce shuffle bytes=0
14/03/20 02:42:14 INFO mapred.JobClient: Spilled Records=282
14/03/20 02:42:14 INFO mapred.JobClient: Map output bytes=1788012
14/03/20 02:42:14 INFO mapred.JobClient: Total committed heap usage
(bytes)=217214976
14/03/20 02:42:14 INFO mapred.JobClient: Combine input records=0
14/03/20 02:42:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=163
14/03/20 02:42:14 INFO mapred.JobClient: Reduce input records=0
14/03/20 02:42:14 INFO mapred.JobClient: Reduce input groups=0
14/03/20 02:42:14 INFO mapred.JobClient: Combine output records=0
14/03/20 02:42:14 INFO mapred.JobClient: Reduce output records=0
14/03/20 02:42:14 INFO mapred.JobClient: Map output records=282
14/03/20 02:42:14 INFO driver.MahoutDriver: Program took 506269 ms (Minutes:
8.437816666666667)
{Code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)