Hi,
I am running clustering using Mahout on Amazon Elastic Mapreduce. The
canopy clustering step failed at some point. I am using 8 instances of
m1.xlarge. The machine that Amazon used for xlarge instance is configured
as follow:
Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual
cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit
platform
The step that cause the erros is the Canopy clustering step:
2009-12-08 09:52:03,057 INFO
org.apache.mahout.clustering.canopy.CanopyDriver (main): Input:
s3://mahout-output/xMDQYC
bDtc/data Out: s3://mahout-output/xMDQYCbDtc/canopies Measure:
org.apache.mahout.common.distance.EuclideanDistanceMeas
ure t1: 80.0 t2: 55.0 Vector Class: SparseVector
And the last few lines of the syslog is as follow:
2009-12-08 09:52:03,196 WARN org.apache.hadoop.mapred.JobClient (main):
Use GenericOptionsParser for parsing the arguments. Applicat
ions should implement Tool for the same.
2009-12-08 09:52:04,014 INFO org.apache.hadoop.mapred.FileInputFormat
(main): Total input paths to process : 105
2009-12-08 09:52:04,222 INFO org.apache.hadoop.mapred.FileInputFormat
(main): Total input paths to process : 105
2009-12-08 09:52:07,301 INFO org.apache.hadoop.mapred.JobClient (main):
Running job: job_200912080939_0002
2009-12-08 09:52:08,304 INFO org.apache.hadoop.mapred.JobClient (main): map
0% reduce 0%
2009-12-08 09:52:17,331 INFO org.apache.hadoop.mapred.JobClient (main): map
2% reduce 0%
2009-12-08 09:52:18,335 INFO org.apache.hadoop.mapred.JobClient (main): map
5% reduce 0%
2009-12-08 09:52:20,340 INFO org.apache.hadoop.mapred.JobClient (main): map
12% reduce 0%
2009-12-08 09:52:21,343 INFO org.apache.hadoop.mapred.JobClient (main): map
13% reduce 0%
2009-12-08 09:52:22,347 INFO org.apache.hadoop.mapred.JobClient (main): map
17% reduce 0%
2009-12-08 09:52:24,363 INFO org.apache.hadoop.mapred.JobClient (main): map
18% reduce 0%
2009-12-08 09:52:25,367 INFO org.apache.hadoop.mapred.JobClient (main): map
25% reduce 0%
2009-12-08 09:52:26,371 INFO org.apache.hadoop.mapred.JobClient (main): map
28% reduce 0%
2009-12-08 09:52:27,374 INFO org.apache.hadoop.mapred.JobClient (main): map
31% reduce 0%
2009-12-08 09:52:28,377 INFO org.apache.hadoop.mapred.JobClient (main): map
35% reduce 0%
2009-12-08 09:52:29,380 INFO org.apache.hadoop.mapred.JobClient (main): map
36% reduce 0%
2009-12-08 09:52:30,383 INFO org.apache.hadoop.mapred.JobClient (main): map
39% reduce 0%
2009-12-08 09:52:31,386 INFO org.apache.hadoop.mapred.JobClient (main): map
43% reduce 0%
2009-12-08 09:52:32,388 INFO org.apache.hadoop.mapred.JobClient (main): map
45% reduce 0%
2009-12-08 09:52:33,392 INFO org.apache.hadoop.mapred.JobClient (main): map
57% reduce 0%
2009-12-08 09:52:34,395 INFO org.apache.hadoop.mapred.JobClient (main): map
62% reduce 0%
2009-12-08 09:52:35,399 INFO org.apache.hadoop.mapred.JobClient (main): map
69% reduce 0%
2009-12-08 09:52:36,409 INFO org.apache.hadoop.mapred.JobClient (main): map
79% reduce 0%
2009-12-08 09:52:37,413 INFO org.apache.hadoop.mapred.JobClient (main): map
82% reduce 0%
2009-12-08 09:52:38,417 INFO org.apache.hadoop.mapred.JobClient (main): map
90% reduce 0%
2009-12-08 09:52:39,420 INFO org.apache.hadoop.mapred.JobClient (main): map
99% reduce 0%
2009-12-08 09:52:42,432 INFO org.apache.hadoop.mapred.JobClient (main): Task
Id : attempt_200912080939_0002_m_000104_0, Status : FAI
LED
2009-12-08 09:52:43,531 INFO org.apache.hadoop.mapred.JobClient (main): map
99% reduce 6%
2009-12-08 09:52:48,544 INFO org.apache.hadoop.mapred.JobClient (main): map
99% reduce 13%
2009-12-08 09:52:48,544 INFO org.apache.hadoop.mapred.JobClient (main): Task
Id : attempt_200912080939_0002_m_000104_1, Status : FAI
LED
2009-12-08 09:52:53,564 INFO org.apache.hadoop.mapred.JobClient (main): map
99% reduce 15%
2009-12-08 09:52:54,567 INFO org.apache.hadoop.mapred.JobClient (main): Task
Id : attempt_200912080939_0002_m_000104_2, Status : FAI
LED
2009-12-08 09:52:58,605 INFO org.apache.hadoop.mapred.JobClient (main): map
99% reduce 22%
I am a newbie to Hadoop and mahout, and I am seeking some help here. Seems
that some of the map reduce job fails. Is it because the file size is too
big? Or there are too many input paths?
Thanks!