Hi Pragnesh, Just wondering if you tried the steps in https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data . It was working just fine like 2 weeks ago. I'll probably verify it tonite (with the latest code from trunk) and let you know.
regards, Joe. On Tue, Oct 5, 2010 at 2:57 PM, Jeff Eastman <[email protected]>wrote: > Hi Pragnesh, > > I really don't know what to suggest to you. I just did a new Mahout > checkout and build, followed by uploading the synthetic_control.data file to > a local Hadoop instance. The k-means job ran without incident. On a hunch, I > also uploaded the file as testdata (not in directory testdata) and that > worked too. I'm baffled why I can't duplicate this and suspect it is a local > system issue. What OS are you running? > > If yours works from Eclipse but not from the command line, I wonder if you > have done mvn clean build from the command line before you ran the CLI > Mahout job? Eclipse compiles its bits into different directories and does > not build the necessary job files. Other than that, I suggest checking your > file system groups and permissions. > > If you find something that gets you running again, *please* post your > solution so we can advise others who are experiencing the same error > message. > > > > On 10/5/10 12:06 AM, pragnesh (JIRA) wrote: > >> [ >> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502] >> >> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM: >> ---------------------------------------------------------- >> >> i am also getting same exption with trunk code >> >> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: >> job_201010041038_0019 >> 10/10/04 12:42:35 INFO mapred.JobClient: map 0% reduce 0% >> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : >> attempt_201010041038_0019_m_000000_0, Status : FAILED >> java.lang.IllegalStateException: No clusters found. Check your -c path. >> at >> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> >> this run fine from eclipse >> >> but when i try to run from command line with hadoop. i see following >> output. >> >> while $MAHOUT_HOME/bin/mahout >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine >> without any error. >> >> pragnesh-laptop% $MAHOUT_HOME/bin/mahout >> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/ >> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo >> 10/10/05 12:26:05 WARN driver.MahoutDriver: No >> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on >> classpath, will use command-line arguments only >> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments >> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input >> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the same. >> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to process >> : 1 >> 10/10/05 12:26:09 INFO mapred.JobClient: Running job: >> job_201010051117_0005 >> 10/10/05 12:26:10 INFO mapred.JobClient: map 0% reduce 0% >> 10/10/05 12:26:26 INFO mapred.JobClient: map 100% reduce 0% >> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete: >> job_201010051117_0005 >> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7 >> 10/10/05 12:26:29 INFO mapred.JobClient: Job Counters >> 10/10/05 12:26:29 INFO mapred.JobClient: Launched map tasks=1 >> 10/10/05 12:26:29 INFO mapred.JobClient: Data-local map tasks=1 >> 10/10/05 12:26:29 INFO mapred.JobClient: FileSystemCounters >> 10/10/05 12:26:29 INFO mapred.JobClient: HDFS_BYTES_READ=288374 >> 10/10/05 12:26:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=335470 >> 10/10/05 12:26:29 INFO mapred.JobClient: Map-Reduce Framework >> 10/10/05 12:26:29 INFO mapred.JobClient: Map input records=600 >> 10/10/05 12:26:29 INFO mapred.JobClient: Spilled Records=0 >> 10/10/05 12:26:29 INFO mapred.JobClient: Map output records=600 >> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters >> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input: >> output/data Out: output Measure: >> org.apache.mahout.common.distance.euclideandistancemeas...@136a43c t1: >> 80.0 t2: 55.0 >> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the same. >> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to process >> : 1 >> 10/10/05 12:26:30 INFO mapred.JobClient: Running job: >> job_201010051117_0006 >> 10/10/05 12:26:31 INFO mapred.JobClient: map 0% reduce 0% >> 10/10/05 12:26:42 INFO mapred.JobClient: map 100% reduce 0% >> 10/10/05 12:26:54 INFO mapred.JobClient: map 100% reduce 100% >> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete: >> job_201010051117_0006 >> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17 >> 10/10/05 12:26:56 INFO mapred.JobClient: Job Counters >> 10/10/05 12:26:56 INFO mapred.JobClient: Launched reduce tasks=1 >> 10/10/05 12:26:56 INFO mapred.JobClient: Launched map tasks=1 >> 10/10/05 12:26:56 INFO mapred.JobClient: Data-local map tasks=1 >> 10/10/05 12:26:56 INFO mapred.JobClient: FileSystemCounters >> 10/10/05 12:26:56 INFO mapred.JobClient: FILE_BYTES_READ=13906 >> 10/10/05 12:26:56 INFO mapred.JobClient: HDFS_BYTES_READ=335470 >> 10/10/05 12:26:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=27844 >> 10/10/05 12:26:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=7131 >> 10/10/05 12:26:56 INFO mapred.JobClient: Map-Reduce Framework >> 10/10/05 12:26:56 INFO mapred.JobClient: Reduce input groups=1 >> 10/10/05 12:26:56 INFO mapred.JobClient: Combine output records=0 >> 10/10/05 12:26:56 INFO mapred.JobClient: Map input records=600 >> 10/10/05 12:26:56 INFO mapred.JobClient: Reduce shuffle bytes=0 >> 10/10/05 12:26:56 INFO mapred.JobClient: Reduce output records=6 >> 10/10/05 12:26:56 INFO mapred.JobClient: Spilled Records=50 >> 10/10/05 12:26:56 INFO mapred.JobClient: Map output bytes=13800 >> 10/10/05 12:26:56 INFO mapred.JobClient: Combine input records=0 >> 10/10/05 12:26:56 INFO mapred.JobClient: Map output records=25 >> 10/10/05 12:26:56 INFO mapred.JobClient: Reduce input records=25 >> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans >> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters >> In: output/clusters-0 Out: output Distance: >> org.apache.mahout.common.distance.EuclideanDistanceMeasure >> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max >> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input >> Vectors: {} >> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1 >> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the same. >> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to process >> : 1 >> 10/10/05 12:26:58 INFO mapred.JobClient: Running job: >> job_201010051117_0007 >> 10/10/05 12:26:59 INFO mapred.JobClient: map 0% reduce 0% >> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id : >> attempt_201010051117_0007_m_000000_0, Status : FAILED >> java.lang.IllegalStateException: No clusters found. Check your -c path. >> at >> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id : >> attempt_201010051117_0007_m_000000_1, Status : FAILED >> java.lang.IllegalStateException: No clusters found. Check your -c path. >> at >> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id : >> attempt_201010051117_0007_m_000000_2, Status : FAILED >> java.lang.IllegalStateException: No clusters found. Check your -c path. >> at >> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete: >> job_201010051117_0007 >> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3 >> 10/10/05 12:27:35 INFO mapred.JobClient: Job Counters >> 10/10/05 12:27:35 INFO mapred.JobClient: Launched map tasks=4 >> 10/10/05 12:27:35 INFO mapred.JobClient: Data-local map tasks=4 >> 10/10/05 12:27:35 INFO mapred.JobClient: Failed map tasks=1 >> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data >> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering >> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters >> In: output/clusters-1 Out: output/clusteredPoints Distance: >> org.apache.mahout.common.distance.euclideandistancemeas...@136a43c >> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input >> Vectors: org.apache.mahout.math.VectorWritable >> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the same. >> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to process >> : 1 >> 10/10/05 12:27:37 INFO mapred.JobClient: Running job: >> job_201010051117_0008 >> 10/10/05 12:27:38 INFO mapred.JobClient: map 0% reduce 0% >> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id : >> attempt_201010051117_0008_m_000000_0, Status : FAILED >> java.lang.IllegalStateException: Cluster is empty! >> at >> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id : >> attempt_201010051117_0008_m_000000_1, Status : FAILED >> java.lang.IllegalStateException: Cluster is empty! >> at >> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id : >> attempt_201010051117_0008_m_000000_2, Status : FAILED >> java.lang.IllegalStateException: Cluster is empty! >> at >> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete: >> job_201010051117_0008 >> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3 >> 10/10/05 12:28:11 INFO mapred.JobClient: Job Counters >> 10/10/05 12:28:11 INFO mapred.JobClient: Launched map tasks=4 >> 10/10/05 12:28:11 INFO mapred.JobClient: Data-local map tasks=4 >> 10/10/05 12:28:11 INFO mapred.JobClient: Failed map tasks=1 >> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms >> >> was (Author: pgradadia): >> i am also getting same exption with trunk code >> >> 10/10/04 12:42:34 INFO mapred.JobClient: Running job: >> job_201010041038_0019 >> 10/10/04 12:42:35 INFO mapred.JobClient: map 0% reduce 0% >> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id : >> attempt_201010041038_0019_m_000000_0, Status : FAILED >> java.lang.IllegalStateException: No clusters found. Check your -c path. >> at >> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> Kmeans clustering error >>> ----------------------- >>> >>> Key: MAHOUT-504 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-504 >>> Project: Mahout >>> Issue Type: Bug >>> Reporter: Zhen Guo >>> Assignee: Robin Anil >>> Fix For: 0.4 >>> >>> >>> I tried the Kmeans algorithm on the Synthetic Control data. The following >>> error appears. I tried the Canopy algorithm, it is fine. This error is from >>> Mapper. I am using Trunk. >>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : >>> attempt_201008261432_1324_m_000000_0, Status : FAILED >>> java.lang.IllegalStateException: Cluster is empty! >>> at >>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >>> >> >
