Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Jeff Eastman Thu, 16 Feb 2012 06:54:25 -0800

As I explain in a the above post, the reason for this is historical. Iagree it should be improved.


On 2/15/12 8:46 PM, Lance Norskog wrote:

Nobody reads the docs. If the program itself can do this, instead of
just barfing, it should. This is a case of Passive-Agressive Error
Reporting.


On Wed, Feb 15, 2012 at 7:20 AM, Jeff Eastman
<[email protected]>  wrote:

The error message describes what the algorithm can see: that there are no
initial clusters. The wiki documentation seems reasonably clear on the use
of -k
(https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) to
obtain them by sampling the input dataset, otherwise -c needs to contain
clusters produced by the user.


On 2/14/12 8:04 PM, Lance Norskog wrote:

Could the error message describe the user's mistake?

On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
<[email protected]>    wrote:

+1 bingo. K-Means is expecting you to provide the prior cluster centers
in
-c. If you want it to sample from your input data you need to add the -k
option to tell it how many you want. This has been a constant part of the
api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
you overlook this argument.



On 2/14/12 8:56 AM, Suneel Marthi wrote:

You are not specifying the number of clusters that need to be generated,
try running again by specifying a -k<number of clusters>      option. You
also
need to specify that you need clustering to be done with -cl.

For example:-

./bin/mahout kmeans -i
./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
10  -ow -k 20 -cl



________________________________
  From: qiang xu (Issue Comment Edited) (JIRA)<[email protected]>
To: [email protected]
Sent: Tuesday, February 14, 2012 10:48 AM
Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
error


     [

https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
]

qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
----------------------------------------------------------

This problem still exist in mahout 0.5 and 0.6
./bin/mahout kmeans -i
./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
  -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
{--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,

--distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
--endPhase=2147483647,
--input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
--maxIter=10, --method=mapreduce,
--output=./examples/bin/work/reuters-kmeans, --overwrite=null,
--startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
Distance:
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
Input
Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
process
: 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job:
job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
         at

org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
         at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
         at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls
/user/root/examples/bin/work/
Found 4 items
drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
/user/root/examples/bin/work/clusters
drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
/user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
/user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
/user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls
/user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r--   2 root supergroup        139 2012-02-14 20:55
/user/root/examples/bin/work/clusters/part-randomSeed

I follow the guide in
https://cwiki.apache.org/MAHOUT/k-means-clustering.html
                       was (Author: skaterxu):
     ./bin/mahout kmeans -i
./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
  -ow
Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
{--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,

--distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
--endPhase=2147483647,
--input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
--maxIter=10, --method=mapreduce,
--output=./examples/bin/work/reuters-kmeans, --overwrite=null,
--startPhase=0, --tempDir=temp}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans
Distance:
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable
Input
Vectors: {}
12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to
process
: 1
12/02/14 20:56:06 INFO mapred.JobClient: Running job:
job_201202131515_0122
12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
attempt_201202131515_0122_m_000000_0, Status : FAILED
java.lang.IllegalStateException: No clusters found. Check your -c path.
         at

org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
         at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
         at org.apache.hadoop.mapred.Child.main(Child.java:170)
It is really weired that cluster is gernerated
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls
/user/root/examples/bin/work/
Found 4 items
drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
/user/root/examples/bin/work/clusters
drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
/user/root/examples/bin/work/reuters-kmeans
drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
/user/root/examples/bin/work/reuters-out-seqdir
drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
/user/root/examples/bin/work/reuters-out-seqdir-sparse
[root@qxutest mahout-distribution-0.5]# hadoop fs -ls
/user/root/examples/bin/work/clusters
Found 1 items
-rw-r--r--   2 root supergroup        139 2012-02-14 20:55
/user/root/examples/bin/work/clusters/part-randomSeed

I follow the guide in
https://cwiki.apache.org/MAHOUT/k-means-clustering.html

Kmeans clustering error
-----------------------

                  Key: MAHOUT-504
                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
              Project: Mahout
           Issue Type: Bug
             Reporter: Zhen Guo
             Assignee: Robin Anil
              Fix For: 0.4


I tried the Kmeans algorithm on the Synthetic Control data. The
following
error appears. I tried the Canopy algorithm, it is fine. This error is
from
Mapper. I am using Trunk.
10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
attempt_201008261432_1324_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
     at

org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
     at org.apache.hadoop.mapred.Child.main(Child.java:170)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA
administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Reply via email to