[ 
https://issues.apache.org/jira/browse/MAHOUT-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1469:
----------------------------------

    Description: 
Centroids are not being generated when executed in MR mode with -rskm flag set.

{Code}
14/03/20 02:42:12 INFO mapreduce.StreamingKMeansThread: Estimated Points: 282
14/03/20 02:42:12 INFO mapred.JobClient:  map 100% reduce 0%
14/03/20 02:42:14 INFO mapreduce.StreamingKMeansReducer: Number of Centroids: 0
14/03/20 02:42:14 WARN mapred.LocalJobRunner: job_local1374896815_0001
java.lang.IllegalArgumentException: Must have nonzero number of training and 
test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0]
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:148)
        at 
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
        at 
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
        at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
        at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
        at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
14/03/20 02:42:14 INFO mapred.JobClient: Job complete: job_local1374896815_0001
14/03/20 02:42:14 INFO mapred.JobClient: Counters: 16
14/03/20 02:42:14 INFO mapred.JobClient:   File Input Format Counters 
14/03/20 02:42:14 INFO mapred.JobClient:     Bytes Read=17156391
14/03/20 02:42:14 INFO mapred.JobClient:   FileSystemCounters
14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_READ=41925624
14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=25974741
14/03/20 02:42:14 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 02:42:14 INFO mapred.JobClient:     Map output materialized 
bytes=956293
14/03/20 02:42:14 INFO mapred.JobClient:     Map input records=21578
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/03/20 02:42:14 INFO mapred.JobClient:     Spilled Records=282
14/03/20 02:42:14 INFO mapred.JobClient:     Map output bytes=1788012
14/03/20 02:42:14 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=217214976
14/03/20 02:42:14 INFO mapred.JobClient:     Combine input records=0
14/03/20 02:42:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=163
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input records=0
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input groups=0
14/03/20 02:42:14 INFO mapred.JobClient:     Combine output records=0
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce output records=0
14/03/20 02:42:14 INFO mapred.JobClient:     Map output records=282
14/03/20 02:42:14 INFO driver.MahoutDriver: Program took 506269 ms (Minutes: 
8.437816666666667)
{Code}

  was:
Centroids are not being generated when executed in MR with -rskm flag set.

{Code}
14/03/20 02:42:12 INFO mapreduce.StreamingKMeansThread: Estimated Points: 282
14/03/20 02:42:12 INFO mapred.JobClient:  map 100% reduce 0%
14/03/20 02:42:14 INFO mapreduce.StreamingKMeansReducer: Number of Centroids: 0
14/03/20 02:42:14 WARN mapred.LocalJobRunner: job_local1374896815_0001
java.lang.IllegalArgumentException: Must have nonzero number of training and 
test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0]
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:148)
        at 
org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
        at 
org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
        at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
        at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
        at 
org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
14/03/20 02:42:14 INFO mapred.JobClient: Job complete: job_local1374896815_0001
14/03/20 02:42:14 INFO mapred.JobClient: Counters: 16
14/03/20 02:42:14 INFO mapred.JobClient:   File Input Format Counters 
14/03/20 02:42:14 INFO mapred.JobClient:     Bytes Read=17156391
14/03/20 02:42:14 INFO mapred.JobClient:   FileSystemCounters
14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_READ=41925624
14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=25974741
14/03/20 02:42:14 INFO mapred.JobClient:   Map-Reduce Framework
14/03/20 02:42:14 INFO mapred.JobClient:     Map output materialized 
bytes=956293
14/03/20 02:42:14 INFO mapred.JobClient:     Map input records=21578
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce shuffle bytes=0
14/03/20 02:42:14 INFO mapred.JobClient:     Spilled Records=282
14/03/20 02:42:14 INFO mapred.JobClient:     Map output bytes=1788012
14/03/20 02:42:14 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=217214976
14/03/20 02:42:14 INFO mapred.JobClient:     Combine input records=0
14/03/20 02:42:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=163
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input records=0
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input groups=0
14/03/20 02:42:14 INFO mapred.JobClient:     Combine output records=0
14/03/20 02:42:14 INFO mapred.JobClient:     Reduce output records=0
14/03/20 02:42:14 INFO mapred.JobClient:     Map output records=282
14/03/20 02:42:14 INFO driver.MahoutDriver: Program took 506269 ms (Minutes: 
8.437816666666667)
{Code}


> Streaming KMeans fails when executed in MapReduce mode and 
> REDUCE_STREAMING_KMEANS is set to true
> -------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1469
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1469
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.9
>            Reporter: Suneel Marthi
>            Assignee: Suneel Marthi
>             Fix For: 1.0
>
>
> Centroids are not being generated when executed in MR mode with -rskm flag 
> set.
> {Code}
> 14/03/20 02:42:12 INFO mapreduce.StreamingKMeansThread: Estimated Points: 282
> 14/03/20 02:42:12 INFO mapred.JobClient:  map 100% reduce 0%
> 14/03/20 02:42:14 INFO mapreduce.StreamingKMeansReducer: Number of Centroids: > 0
> 14/03/20 02:42:14 WARN mapred.LocalJobRunner: job_local1374896815_0001
> java.lang.IllegalArgumentException: Must have nonzero number of training and 
> test vectors. Asked for %.1f %% of %d vectors for test [10.000000149011612, 0]
>       at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:148)
>       at 
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(BallKMeans.java:176)
>       at 
> org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMeans.java:192)
>       at 
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.getBestCentroids(StreamingKMeansReducer.java:107)
>       at 
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:73)
>       at 
> org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:37)
>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:177)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
> 14/03/20 02:42:14 INFO mapred.JobClient: Job complete: 
> job_local1374896815_0001
> 14/03/20 02:42:14 INFO mapred.JobClient: Counters: 16
> 14/03/20 02:42:14 INFO mapred.JobClient:   File Input Format Counters 
> 14/03/20 02:42:14 INFO mapred.JobClient:     Bytes Read=17156391
> 14/03/20 02:42:14 INFO mapred.JobClient:   FileSystemCounters
> 14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_READ=41925624
> 14/03/20 02:42:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=25974741
> 14/03/20 02:42:14 INFO mapred.JobClient:   Map-Reduce Framework
> 14/03/20 02:42:14 INFO mapred.JobClient:     Map output materialized 
> bytes=956293
> 14/03/20 02:42:14 INFO mapred.JobClient:     Map input records=21578
> 14/03/20 02:42:14 INFO mapred.JobClient:     Reduce shuffle bytes=0
> 14/03/20 02:42:14 INFO mapred.JobClient:     Spilled Records=282
> 14/03/20 02:42:14 INFO mapred.JobClient:     Map output bytes=1788012
> 14/03/20 02:42:14 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=217214976
> 14/03/20 02:42:14 INFO mapred.JobClient:     Combine input records=0
> 14/03/20 02:42:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=163
> 14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input records=0
> 14/03/20 02:42:14 INFO mapred.JobClient:     Reduce input groups=0
> 14/03/20 02:42:14 INFO mapred.JobClient:     Combine output records=0
> 14/03/20 02:42:14 INFO mapred.JobClient:     Reduce output records=0
> 14/03/20 02:42:14 INFO mapred.JobClient:     Map output records=282
> 14/03/20 02:42:14 INFO driver.MahoutDriver: Program took 506269 ms (Minutes: 
> 8.437816666666667)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to