Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Joe Kumar Tue, 05 Oct 2010 19:06:22 -0700

Pragnesh,

I got the latest code from repo and did mvn clean and mvn install.
Then I followed the instructions in the wiki link I had mentioned below and
the kmeans clustering task on synthetic control executed just fine.
Please let know if you face issues following steps in the wiki.


regards
Joe.

On Tue, Oct 5, 2010 at 4:34 PM, Joe Kumar <[email protected]> wrote:

> Hi Pragnesh,
>
> Just wondering if you tried the steps in
> https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data
> .
> It was working just fine like 2 weeks ago. I'll probably verify it tonite
> (with the latest code from trunk) and let you know.
>
> regards,
> Joe.
>
>
> On Tue, Oct 5, 2010 at 2:57 PM, Jeff Eastman 
> <[email protected]>wrote:
>
>>  Hi Pragnesh,
>>
>> I really don't know what to suggest to you. I just did a new Mahout
>> checkout and build, followed by uploading the synthetic_control.data file to
>> a local Hadoop instance. The k-means job ran without incident. On a hunch, I
>> also uploaded the file as testdata (not in directory testdata) and that
>> worked too. I'm baffled why I can't duplicate this and suspect it is a local
>> system issue. What OS are you running?
>>
>> If yours works from Eclipse but not from the command line, I wonder if you
>> have done mvn clean build from the command line before you ran the CLI
>> Mahout job? Eclipse compiles its bits into different directories and does
>> not build the necessary job files. Other than that, I suggest checking your
>> file system groups and permissions.
>>
>> If you find something that gets you running again, *please* post your
>> solution so we can advise others who are experiencing the same error
>> message.
>>
>>
>>
>> On 10/5/10 12:06 AM, pragnesh (JIRA) wrote:
>>
>>>     [
>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917502#action_12917502]
>>>
>>> pragnesh edited comment on MAHOUT-504 at 10/5/10 3:05 AM:
>>> ----------------------------------------------------------
>>>
>>> i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>
>>> this run fine from eclipse
>>>
>>> but when i try to run from command line with hadoop. i see following
>>> output.
>>>
>>> while  $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job running fine
>>> without any error.
>>>
>>> pragnesh-laptop% $MAHOUT_HOME/bin/mahout
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop/
>>> HADOOP_CONF_DIR=/etc/hadoop/conf.pseudo
>>> 10/10/05 12:26:05 WARN driver.MahoutDriver: No
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.props found on
>>> classpath, will use command-line arguments only
>>> 10/10/05 12:26:05 INFO kmeans.Job: Running with default arguments
>>> 10/10/05 12:26:06 INFO kmeans.Job: Preparing Input
>>> 10/10/05 12:26:06 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:07 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:09 INFO mapred.JobClient: Running job:
>>> job_201010051117_0005
>>> 10/10/05 12:26:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:26 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:28 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0005
>>> 10/10/05 12:26:29 INFO mapred.JobClient: Counters: 7
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_READ=288374
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
>>> 10/10/05 12:26:29 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Spilled Records=0
>>> 10/10/05 12:26:29 INFO mapred.JobClient:     Map output records=600
>>> 10/10/05 12:26:29 INFO kmeans.Job: Running Canopy to get initial clusters
>>> 10/10/05 12:26:29 INFO canopy.CanopyDriver: Build Clusters Input:
>>> output/data Out: output Measure:
>>> org.apache.mahout.common.distance.euclideandistancemeas...@136a43c t1:
>>> 80.0 t2: 55.0
>>> 10/10/05 12:26:29 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:29 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:30 INFO mapred.JobClient: Running job:
>>> job_201010051117_0006
>>> 10/10/05 12:26:31 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:26:42 INFO mapred.JobClient:  map 100% reduce 0%
>>> 10/10/05 12:26:54 INFO mapred.JobClient:  map 100% reduce 100%
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0006
>>> 10/10/05 12:26:56 INFO mapred.JobClient: Counters: 17
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Launched map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Data-local map tasks=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   FileSystemCounters
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_READ=13906
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_READ=335470
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=27844
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7131
>>> 10/10/05 12:26:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input groups=1
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine output records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map input records=600
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce shuffle bytes=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce output records=6
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Spilled Records=50
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output bytes=13800
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Combine input records=0
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Map output records=25
>>> 10/10/05 12:26:56 INFO mapred.JobClient:     Reduce input records=25
>>> 10/10/05 12:26:56 INFO kmeans.Job: Running KMeans
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-0 Out: output Distance:
>>> org.apache.mahout.common.distance.EuclideanDistanceMeasure
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: convergence: 0.5 max
>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>>> Vectors: {}
>>> 10/10/05 12:26:56 INFO kmeans.KMeansDriver: K-Means Iteration 1
>>> 10/10/05 12:26:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:26:57 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:26:58 INFO mapred.JobClient: Running job:
>>> job_201010051117_0007
>>> 10/10/05 12:26:59 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:08 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:14 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:23 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0007_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0007
>>> 10/10/05 12:27:35 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:27:35 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:27:35 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Clustering data
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Running Clustering
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: Input: output/data Clusters
>>> In: output/clusters-1 Out: output/clusteredPoints Distance:
>>> org.apache.mahout.common.distance.euclideandistancemeas...@136a43c
>>> 10/10/05 12:27:35 INFO kmeans.KMeansDriver: convergence: 0.5 Input
>>> Vectors: org.apache.mahout.math.VectorWritable
>>> 10/10/05 12:27:35 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 10/10/05 12:27:36 INFO input.FileInputFormat: Total input paths to
>>> process : 1
>>> 10/10/05 12:27:37 INFO mapred.JobClient: Running job:
>>> job_201010051117_0008
>>> 10/10/05 12:27:38 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/05 12:27:47 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:53 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_1, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:27:59 INFO mapred.JobClient: Task Id :
>>> attempt_201010051117_0008_m_000000_2, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Job complete:
>>> job_201010051117_0008
>>> 10/10/05 12:28:11 INFO mapred.JobClient: Counters: 3
>>> 10/10/05 12:28:11 INFO mapred.JobClient:   Job Counters
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Launched map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Data-local map tasks=4
>>> 10/10/05 12:28:11 INFO mapred.JobClient:     Failed map tasks=1
>>> 10/10/05 12:28:12 INFO driver.MahoutDriver: Program took 126495 ms
>>>
>>>       was (Author: pgradadia):
>>>     i am also getting same exption with trunk code
>>>
>>> 10/10/04 12:42:34 INFO mapred.JobClient: Running job:
>>> job_201010041038_0019
>>> 10/10/04 12:42:35 INFO mapred.JobClient:  map 0% reduce 0%
>>> 10/10/04 12:42:45 INFO mapred.JobClient: Task Id :
>>> attempt_201010041038_0019_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>>        at
>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:61)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>
>>>  Kmeans clustering error
>>>> -----------------------
>>>>
>>>>                 Key: MAHOUT-504
>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>>             Project: Mahout
>>>>          Issue Type: Bug
>>>>            Reporter: Zhen Guo
>>>>            Assignee: Robin Anil
>>>>             Fix For: 0.4
>>>>
>>>>
>>>> I tried the Kmeans algorithm on the Synthetic Control data. The
>>>> following error appears. I tried the Canopy algorithm, it is fine. This
>>>> error is from Mapper. I am using Trunk.
>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>>> java.lang.IllegalStateException: Cluster is empty!
>>>>        at
>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>>        at
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>
>>>
>>
>
>
>
>

Re: [jira] Issue Comment Edited: (MAHOUT-504) Kmeans clustering error

Reply via email to