Re: Problems with KMeans clustering

Grant Ingersoll Mon, 27 Oct 2008 18:32:00 -0700

That is, I can reproduce the original problem.

On Oct 27, 2008, at 9:22 PM, Grant Ingersoll wrote:

OK, I can reproduce this.

On Oct 27, 2008, at 8:14 PM, Philippe Lamarche wrote:
I removed the apache-mahout-core-0.1-dev.jar file from {hadoop-home}/lib and
added apache-mahout-examples-0.1-dev.job

my lib folder now contain :
-rw-r--r-- 1 hadoop hadoop 4506592 2008-10-27 19:59
apache-mahout-examples-0.1-dev.job
-rw-r--r-- 1 hadoop root    258337 2008-10-27 14:37
commons-cli-2.0-SNAPSHOT.jar
-rw-r--r-- 1 hadoop root 46725 2008-10-27 14:37 commons-codec-1.3.jar
-rw-r--r-- 1 hadoop root    279781 2008-10-27 14:37
commons-httpclient-3.0.1.jar
-rw-r--r-- 1 hadoop root     38015 2008-10-27 14:37
commons-logging-1.0.4.jar
-rw-r--r-- 1 hadoop root     26202 2008-10-27 14:37
commons-logging-api-1.0.4.jar
-rw-r--r-- 1 hadoop root 180792 2008-10-27 14:37 commons-net-1.4.1.jar
-rw-r--r-- 1 hadoop root    288534 2008-10-27 14:37 jets3t-0.6.0.jar
-rw-r--r-- 1 hadoop root    665638 2008-10-27 14:37 jetty-5.1.4.jar
-rw-r--r-- 1 hadoop root 11358 2008-10-27 14:37jetty-5.1.4.LICENSE.txt
drwxr-xr-x 2 hadoop root      4096 2008-10-27 14:37 jetty-ext
-rw-r--r-- 1 hadoop root    121070 2008-10-27 14:37 junit-3.8.1.jar
-rw-r--r-- 1 hadoop root 14999 2008-10-27 14:37junit-3.8.1.LICENSE.txt
-rw-r--r-- 1 hadoop root      9484 2008-10-27 14:37 kfs-0.1.3.jar
-rw-r--r-- 1 hadoop root 11358 2008-10-27 14:37kfs-0.1.LICENSE.txt
-rw-r--r-- 1 hadoop root    391834 2008-10-27 14:37 log4j-1.2.15.jar
drwxr-xr-x 4 hadoop root      4096 2008-10-27 14:37 native
-rw-r--r-- 1 hadoop root     65261 2008-10-27 14:37 oro-2.0.8.jar
-rw-r--r-- 1 hadoop root     97689 2008-10-27 14:37 servlet-api.jar
-rw-r--r-- 1 hadoop root 15345 2008-10-27 14:37 slf4j-api-1.4.3.jar
-rw-r--r-- 1 hadoop root      1159 2008-10-27 14:37 slf4j-LICENSE.txt
-rw-r--r-- 1 hadoop root 8601 2008-10-27 14:37 slf4j-log4j12-1.4.3.jar
-rw-r--r-- 1 hadoop root     15010 2008-10-27 14:37 xmlenc-0.52.jar

when I try to run the synthetic example I get:

[EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar
/home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
java.lang.NoClassDefFoundError: org/apache/mahout/matrix/Vector
  at
org.apache.mahout.clustering.syntheticcontrol.canopy.InputDriver.runJob(InputDriver.java:42)
  at
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:77)
  at
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:44)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
  at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
  at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.lang.ClassNotFoundException:org.apache.mahout.matrix.Vector
  at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
  at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
  ... 12 more
Right now, Hadoop doesn't have any additional classpath elementsthat I know
of, from conf/hadoop-env.sh or elsewhere.

Did I understand correctly what you were saying?
On Mon, Oct 27, 2008 at 7:29 PM, Grant Ingersoll<[EMAIL PROTECTED]>wrote:
On Oct 27, 2008, at 4:26 PM, Philippe Lamarche wrote:

Hi,
My goal is to run the example KMeans. I must download the synthetic
control
data and put it on the dfs in "testdata".

To be sure that everything is ok, I stated form a clean state on my
laptop.

I downloaded hadoop 0.18.1.

I changed the conf/hadoop-site.xml to this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-datastore/hadoop-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

I changed JAVA_HOME in hadoop-env.sh.

I downloaded mahout from SVN, at revision 708282.

I built both core and example from ant script.

I copied apache-mahout-core-0.1-dev.jar to {hadoop-home}/lib.
What happens if you don't do this but use the "job" file instead(ant jobin the examples dir)? I'm trying to replicate this, but am stuckat the
moment.
I downloaded

http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

I added the file to the dfs:
[EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop dfs -put
/home/philippe/synthetic_control.data testdata

I ran the example jar, but it failed :

[EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar
/home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
08/10/27 15:34:55 WARN mapred.JobClient: Use GenericOptionsParserforparsing the arguments. Applications should implement Tool for thesame.
08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to
process
: 1
08/10/27 15:34:55 INFO mapred.FileInputFormat: Total input paths to
process
: 1
08/10/27 15:34:55 INFO mapred.JobClient: Running job:
job_200810271532_0001
08/10/27 15:34:56 INFO mapred.JobClient:  map 0% reduce 0%
08/10/27 15:34:59 INFO mapred.JobClient: Job complete:
job_200810271532_0001
08/10/27 15:34:59 INFO mapred.JobClient: Counters: 7
08/10/27 15:34:59 INFO mapred.JobClient:   File Systems
08/10/27 15:34:59 INFO mapred.JobClient:     HDFS bytes read=291644
08/10/27 15:34:59 INFO mapred.JobClient: HDFS byteswritten=323660
08/10/27 15:34:59 INFO mapred.JobClient:   Job Counters
08/10/27 15:34:59 INFO mapred.JobClient:     Launched map tasks=2
08/10/27 15:34:59 INFO mapred.JobClient:     Data-local map tasks=2
08/10/27 15:34:59 INFO mapred.JobClient:   Map-Reduce Framework
08/10/27 15:34:59 INFO mapred.JobClient:     Map input records=600
08/10/27 15:34:59 INFO mapred.JobClient:     Map input bytes=288374
08/10/27 15:34:59 INFO mapred.JobClient:     Map output records=600
08/10/27 15:34:59 WARN mapred.JobClient: Use GenericOptionsParserforparsing the arguments. Applications should implement Tool for thesame.
08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:00 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:00 INFO mapred.JobClient: Running job:
job_200810271532_0002
08/10/27 15:35:01 INFO mapred.JobClient:  map 0% reduce 0%
08/10/27 15:35:10 INFO mapred.JobClient:  map 100% reduce 0%
08/10/27 15:35:16 INFO mapred.JobClient: Job complete:
job_200810271532_0002
08/10/27 15:35:16 INFO mapred.JobClient: Counters: 16
08/10/27 15:35:16 INFO mapred.JobClient:   File Systems
08/10/27 15:35:16 INFO mapred.JobClient:     HDFS bytes read=323660
08/10/27 15:35:16 INFO mapred.JobClient: HDFS byteswritten=1447
08/10/27 15:35:16 INFO mapred.JobClient:     Local bytes read=1389
08/10/27 15:35:16 INFO mapred.JobClient: Local byteswritten=37878
08/10/27 15:35:16 INFO mapred.JobClient:   Job Counters
08/10/27 15:35:16 INFO mapred.JobClient: Launched reducetasks=1
08/10/27 15:35:16 INFO mapred.JobClient:     Launched map tasks=2
08/10/27 15:35:16 INFO mapred.JobClient:     Data-local map tasks=2
08/10/27 15:35:16 INFO mapred.JobClient:   Map-Reduce Framework
08/10/27 15:35:16 INFO mapred.JobClient:     Reduce input groups=1
08/10/27 15:35:16 INFO mapred.JobClient: Combine outputrecords=29
08/10/27 15:35:16 INFO mapred.JobClient:     Map input records=600
08/10/27 15:35:16 INFO mapred.JobClient: Reduce outputrecords=108/10/27 15:35:16 INFO mapred.JobClient: Map outputbytes=943020
08/10/27 15:35:16 INFO mapred.JobClient:     Map input bytes=323660
08/10/27 15:35:16 INFO mapred.JobClient: Combine inputrecords=176008/10/27 15:35:16 INFO mapred.JobClient: Map outputrecords=1732
08/10/27 15:35:16 INFO mapred.JobClient:     Reduce input records=1
08/10/27 15:35:16 WARN mapred.JobClient: Use GenericOptionsParserforparsing the arguments. Applications should implement Tool for thesame.
08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:16 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:16 INFO mapred.JobClient: Running job:
job_200810271532_0003
08/10/27 15:35:17 INFO mapred.JobClient:  map 0% reduce 0%
08/10/27 15:35:24 INFO mapred.JobClient:  map 100% reduce 0%
08/10/27 15:35:28 INFO mapred.JobClient: Job complete:
job_200810271532_0003
08/10/27 15:35:28 INFO mapred.JobClient: Counters: 16
08/10/27 15:35:28 INFO mapred.JobClient:   File Systems
08/10/27 15:35:28 INFO mapred.JobClient:     HDFS bytes read=326554
08/10/27 15:35:28 INFO mapred.JobClient: HDFS byteswritten=113726008/10/27 15:35:28 INFO mapred.JobClient: Local bytesread=114735808/10/27 15:35:28 INFO mapred.JobClient: Local byteswritten=2304490
08/10/27 15:35:28 INFO mapred.JobClient:   Job Counters
08/10/27 15:35:28 INFO mapred.JobClient: Launched reducetasks=1
08/10/27 15:35:28 INFO mapred.JobClient:     Launched map tasks=2
08/10/27 15:35:28 INFO mapred.JobClient:     Data-local map tasks=2
08/10/27 15:35:28 INFO mapred.JobClient:   Map-Reduce Framework
08/10/27 15:35:28 INFO mapred.JobClient:     Reduce input groups=1
08/10/27 15:35:28 INFO mapred.JobClient: Combine outputrecords=0
08/10/27 15:35:28 INFO mapred.JobClient:     Map input records=600
08/10/27 15:35:28 INFO mapred.JobClient: Reduce outputrecords=60008/10/27 15:35:28 INFO mapred.JobClient: Map outputbytes=1139660
08/10/27 15:35:28 INFO mapred.JobClient:     Map input bytes=323660
08/10/27 15:35:28 INFO mapred.JobClient: Combine inputrecords=0
08/10/27 15:35:28 INFO mapred.JobClient:     Map output records=600
08/10/27 15:35:28 INFO mapred.JobClient: Reduce inputrecords=600
08/10/27 15:35:28 INFO kmeans.KMeansDriver: Iteration 0
08/10/27 15:35:29 WARN mapred.JobClient: Use GenericOptionsParserforparsing the arguments. Applications should implement Tool for thesame.
08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:29 INFO mapred.FileInputFormat: Total input paths to
process
: 2
08/10/27 15:35:29 INFO mapred.JobClient: Running job:
job_200810271532_0004
08/10/27 15:35:30 INFO mapred.JobClient:  map 0% reduce 0%
08/10/27 15:35:37 INFO mapred.JobClient:  map 100% reduce 0%
08/10/27 15:35:45 INFO mapred.JobClient: Task Id :
attempt_200810271532_0004_r_000000_0, Status : FAILED
java.io.IOException: attempt_200810271532_0004_r_000000_0Thereduce copier
failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
The failed attempts logs contain this:

008-10-27 15:35:40,133 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 2524 bytes (2524 raw bytes) into RAM from
attempt_200810271532_0004_m_000000_0
2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask:Read
2524 bytes from map-output for attempt_200810271532_0004_m_000000_0
2008-10-27 15:35:40,134 INFO org.apache.hadoop.mapred.ReduceTask:Rec#1 from attempt_200810271532_0004_m_000000_0 -> (1358, 1158) fromphil
2008-10-27 15:35:41,110 INFO org.apache.hadoop.mapred.ReduceTask:
Closed ram manager
2008-10-27 15:35:41,125 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 0 files left.
2008-10-27 15:35:41,173 INFO org.apache.hadoop.mapred.ReduceTask:
Initiating in-memory merge with 2 segments...
2008-10-27 15:35:41,177 INFO org.apache.hadoop.mapred.Merger:Merging
2 sorted segments
2008-10-27 15:35:41,178 INFO org.apache.hadoop.mapred.Merger:Down to
the last merge-pass, with 2 segments left of total size: 5011 bytes
2008-10-27 15:35:41,197 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200810271532_0004_r_000000_0 Merge of the inmemory filesthrew
an exception: java.io.IOException: Intermedate merge failed
     at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
     at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)
Caused by: java.lang.NumberFormatException: For input string: "["
     at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
     at java.lang.Double.parseDouble(Double.java:510)
     at
org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
     at
org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
     at
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
     at
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
     at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
     at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
     at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
     ... 1 more

2008-10-27 15:35:41,197 INFO org.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 0 files left.
2008-10-27 15:35:41,198 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.IOException: attempt_200810271532_0004_r_000000_0The reduce
copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
     at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
However, I can run the org.apache.mahout.clustering.kmeans unittests
without problems.

I truly do not understand where the problems lies.
Thanks for the help.


On Sun, Oct 26, 2008 at 8:24 PM, Grant Ingersoll <[EMAIL PROTECTED]
wrote:
Same Mahout code, though, right?
Can you provide details on how you were running it?


On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
Unfortunately, I went straight from 0.17.2 to 0.18.1. It wasworking on
0.17.2.



On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <[EMAIL PROTECTED]
wrote:
Did this work with 0.18.0 or other prior versions for you?
On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:

Hi,
I just updated to hadoop 0.18.1 and got a clean version ofmahout from
svn.
However, I am having problems with KMeans, that can be traceddown to
:
2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:Merging
2 sorted segments
2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:Down tothe last merge-pass, with 2 segments left of total size: 5011bytes2008-10-25 19:10:16,999 WARNorg.apache.hadoop.mapred.ReduceTask:attempt_200810251826_0013_r_000000_0 Merge of the inmemoryfiles threw
an exception: java.io.IOException: Intermedate merge failed
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)Caused by: java.lang.NumberFormatException: For input string:"["
 at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
 at java.lang.Double.parseDouble(Double.java:510)
 at
org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
 at
org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
 at
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
 at
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
 ... 1 more
2008-10-25 19:10:16,999 INFOorg.apache.hadoop.mapred.ReduceTask:
In-memory merge complete: 0 files left.
2008-10-25 19:10:17,000 WARNorg.apache.hadoop.mapred.TaskTracker:
Error running child
java.io.IOException: attempt_200810251826_0013_r_000000_0Thereduce
copier failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
 at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
This is while running the synthetic_control.data example, butI have
the
same problems with any other input data.

I am able to do other map-reduce job without problems.

Here is the output of the jar task:

[EMAIL PROTECTED]:/usr/local/hadoop$ bin/hadoop jar
/home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
08/10/25 19:09:27 WARN mapred.JobClient: UseGenericOptionsParser forparsing the arguments. Applications should implement Tool forthe
same.
08/10/25 19:09:28 INFO mapred.FileInputFormat: Total inputpaths to
process
: 1
08/10/25 19:09:28 INFO mapred.FileInputFormat: Total inputpaths to
process
: 1
08/10/25 19:09:28 INFO mapred.JobClient: Running job:
job_200810251826_0010
08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce 0%
08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce 0%
08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
job_200810251826_0010
08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
08/10/25 19:09:32 INFO mapred.JobClient: HDFS bytesread=29164408/10/25 19:09:32 INFO mapred.JobClient: HDFS byteswritten=323660
08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
08/10/25 19:09:32 INFO mapred.JobClient: Launched maptasks=208/10/25 19:09:32 INFO mapred.JobClient: Data-local maptasks=2
08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce Framework
08/10/25 19:09:32 INFO mapred.JobClient: Map inputrecords=60008/10/25 19:09:32 INFO mapred.JobClient: Map inputbytes=28837408/10/25 19:09:32 INFO mapred.JobClient: Map outputrecords=60008/10/25 19:09:32 WARN mapred.JobClient: UseGenericOptionsParser forparsing the arguments. Applications should implement Tool forthe
same.
08/10/25 19:09:32 INFO mapred.FileInputFormat: Total inputpaths to
process
: 2
08/10/25 19:09:32 INFO mapred.FileInputFormat: Total inputpaths to
process
: 2
08/10/25 19:09:32 INFO mapred.JobClient: Running job:
job_200810251826_0011
08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce 0%
08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce 0%
08/10/25 19:09:39 INFO mapred.JobClient:  map 100% reduce 0%
08/10/25 19:09:44 INFO mapred.JobClient:  map 100% reduce 16%
08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
job_200810251826_0011
08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
08/10/25 19:09:52 INFO mapred.JobClient: HDFS bytesread=32366008/10/25 19:09:52 INFO mapred.JobClient: HDFS byteswritten=144708/10/25 19:09:52 INFO mapred.JobClient: Local bytesread=138908/10/25 19:09:52 INFO mapred.JobClient: Local byteswritten=37878
08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
08/10/25 19:09:52 INFO mapred.JobClient: Launched reducetasks=108/10/25 19:09:52 INFO mapred.JobClient: Launched maptasks=208/10/25 19:09:52 INFO mapred.JobClient: Data-local maptasks=2
08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce Framework
08/10/25 19:09:52 INFO mapred.JobClient: Reduce inputgroups=108/10/25 19:09:52 INFO mapred.JobClient: Combine outputrecords=2908/10/25 19:09:52 INFO mapred.JobClient: Map inputrecords=60008/10/25 19:09:52 INFO mapred.JobClient: Reduce outputrecords=108/10/25 19:09:52 INFO mapred.JobClient: Map outputbytes=94302008/10/25 19:09:52 INFO mapred.JobClient: Map inputbytes=323660
08/10/25 19:09:52 INFO mapred.JobClient:     Combine input
records=1760
08/10/25 19:09:52 INFO mapred.JobClient: Map outputrecords=173208/10/25 19:09:52 INFO mapred.JobClient: Reduce inputrecords=108/10/25 19:09:53 WARN mapred.JobClient: UseGenericOptionsParser forparsing the arguments. Applications should implement Tool forthe
same.
08/10/25 19:09:53 INFO mapred.FileInputFormat: Total inputpaths to
process
: 2
08/10/25 19:09:53 INFO mapred.FileInputFormat: Total inputpaths to
process
: 2
08/10/25 19:09:53 INFO mapred.JobClient: Running job:
job_200810251826_0012
08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce 0%
08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce 0%
08/10/25 19:09:58 INFO mapred.JobClient:  map 100% reduce 0%
08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
job_200810251826_0012
08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
08/10/25 19:10:02 INFO mapred.JobClient: HDFS bytesread=326554
08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
written=1137260
08/10/25 19:10:02 INFO mapred.JobClient: Local bytesread=1147358
08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
written=2304490
08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
08/10/25 19:10:02 INFO mapred.JobClient: Launched reducetasks=108/10/25 19:10:02 INFO mapred.JobClient: Launched maptasks=208/10/25 19:10:02 INFO mapred.JobClient: Data-local maptasks=2
08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce Framework
08/10/25 19:10:02 INFO mapred.JobClient: Reduce inputgroups=108/10/25 19:10:02 INFO mapred.JobClient: Combine outputrecords=008/10/25 19:10:02 INFO mapred.JobClient: Map inputrecords=60008/10/25 19:10:02 INFO mapred.JobClient: Reduce outputrecords=60008/10/25 19:10:02 INFO mapred.JobClient: Map outputbytes=113966008/10/25 19:10:02 INFO mapred.JobClient: Map inputbytes=32366008/10/25 19:10:02 INFO mapred.JobClient: Combine inputrecords=008/10/25 19:10:02 INFO mapred.JobClient: Map outputrecords=60008/10/25 19:10:02 INFO mapred.JobClient: Reduce inputrecords=600
08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
08/10/25 19:10:02 WARN mapred.JobClient: UseGenericOptionsParser forparsing the arguments. Applications should implement Tool forthe
same.
08/10/25 19:10:02 INFO mapred.FileInputFormat: Total inputpaths to
process
: 2
08/10/25 19:10:02 INFO mapred.FileInputFormat: Total inputpaths to
process
: 2
08/10/25 19:10:03 INFO mapred.JobClient: Running job:
job_200810251826_0013
08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce 0%
08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce 0%
08/10/25 19:10:09 INFO mapred.JobClient:  map 100% reduce 0%
08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
attempt_200810251826_0013_r_000000_0, Status : FAILED
java.io.IOException: attempt_200810251826_0013_r_000000_0Thereduce
copier
failed
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
I am not sure if I am doing something wrong here.

Thanks for the help,

Philippe.


--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US NewOrleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US NewOrleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ


--------------------------
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Problems with KMeans clustering

Reply via email to